Microsoft details Maia 200 inference accelerator

Microsoft details Maia 200 inference accelerator

Microsoft has detailed Maia 200, its latest inference accelerator silicon. Built on TSMC 3nm, the chip pairs FP8/FP4 tensor cores with 216GB HBM3e, targeting higher token-throughput per watt in Azure-scale deployments.


IN Brief:

  • Maia 200 is Microsoft’s new inference-focused accelerator, fabricated on TSMC’s 3nm process.
  • Key hardware claims include FP8/FP4 tensor cores, 216GB HBM3e at 7 TB/s, and 272MB on-chip SRAM.
  • Microsoft says initial deployment is in Azure’s US Central region, with a Maia SDK preview offering PyTorch integration and a Triton compiler.

Microsoft has unveiled technical detail on Maia 200, its latest in-house AI inference accelerator, and the specifications read like a direct response to the realities of token generation at scale: memory bandwidth, data movement, and system-level networking matter at least as much as peak compute. Maia 200 is built on TSMC’s 3nm process and is designed around low-precision inference workloads, with native FP8 and FP4 tensor processing.

On-package memory is the headline. Microsoft says the redesigned memory system delivers 216GB of HBM3e at 7 TB/s, backed by 272MB of on-chip SRAM, with data movement engines intended to keep large models fed without starving compute. The company also states each chip contains more than 140 billion transistors, and frames performance in practical low-precision terms: over 10 petaFLOPS in FP4 and over 5 petaFLOPS in FP8, within a 750W SoC TDP envelope.

Those are big numbers, but the more telling engineering choices sit in the system design. Microsoft describes a two-tier scale-up network built on standard Ethernet, with a custom transport layer and integrated NIC intended to avoid reliance on proprietary fabrics. It claims 2.8 TB/s of bidirectional dedicated scale-up bandwidth per accelerator and predictable collective operations across clusters scaling up to 6,144 accelerators, with four accelerators fully connected inside each tray to keep local traffic local.

Deployment detail matters for engineers trying to map availability to real design programmes. Microsoft says Maia 200 is already deployed in its US Central datacentre region near Des Moines, Iowa, with the US West 3 region near Phoenix, Arizona, next, and more regions to follow. Alongside the hardware, Microsoft is previewing a Maia SDK that it says includes PyTorch integration, a Triton compiler, an optimised kernel library, and access to a low-level programming language, with the aim of making model porting across heterogeneous accelerator fleets less painful.

Maia 200 is another sign that hyperscalers are no longer content to optimise around one vendor’s roadmap, and that the power, cooling, and networking constraints of AI infrastructure are now shaping silicon microarchitecture in plain view.

If Microsoft’s Ethernet-based scale-up approach delivers the utilisation it’s promising, it will pressure the wider ecosystem — not only on raw performance, but on how quickly a platform can be made usable by software teams who have no appetite for bespoke toolchains that die after one generation.


Stories for you


  • Entry-level MCUs get Ethernet and security uplift

    Entry-level MCUs get Ethernet and security uplift

    ST’s STM32C5 puts Cortex-M33 performance into entry-level designs everywhere today. A 40nm platform, up to 1MB Flash, and security features bring faster control and richer connectivity into cost-sensitive smart devices.


  • Printed batteries meet AI chemistry at scale

    Printed batteries meet AI chemistry at scale

    Holyvolt has bought Wildcat to speed battery chemistry to scale. The $73m deal ties high-throughput materials discovery to screen-printed, water-based manufacturing aimed at faster pilot production and lower-cost industrialisation in Europe and North America.