Positron pushes into AI inference infrastructure

IN Brief:

Positron has sold its first chips to Oracle and is deploying systems into Oracle cloud infrastructure for inference workloads.
The next-generation Asimov processor is being developed on TSMC N3P with LPDDR-based memory rather than CoWoS-plus-HBM packaging.
Air-cooled operation and lower rack-density targets point to a different route into AI infrastructure than the dominant GPU model.

Positron has stepped into a more serious phase of the AI silicon race after securing early deployment with Oracle and outlining the architecture behind its next-generation Asimov processor, a design that aims to sidestep some of the most entrenched bottlenecks in current inference infrastructure.

The company has already sold its first chips to Oracle and is deploying systems and racks into Oracle cloud infrastructure for inference workloads, with a particular emphasis on mixture-of-experts models. That alone is enough to move the story beyond speculative startup noise. Infrastructure customers do not buy on novelty. They buy when a system fits a commercial and operational constraint that mainstream platforms are handling badly or expensively.

Positron’s chosen constraint is not training scale. It is inference economics inside facilities that have usable power available, but not at the rack densities and cooling profiles now associated with top-tier GPU deployments. The company says some of its target environments run at 15 kW to 30 kW per rack and remain air cooled, which immediately limits the class of hardware that can be deployed economically. That creates an opening for processors designed around lower thermal density, acceptable token throughput, and a more manageable integration path.

The Asimov chip is central to that strategy. Due in 2027, it is being developed on TSMC’s N3P process and is designed to stay air cooled at roughly 450 W to 500 W per chip. Positron is also taking a different memory route from the now familiar AI accelerator recipe. Instead of pairing advanced packaging with HBM through CoWoS, it is using LPDDR as attached commodity memory on an organic substrate, with a chiplet structure intended to push aggregate bandwidth closer to what high-performance inference needs.

That choice is significant. HBM has become one of the main choke points in AI infrastructure, not only because of cost, but because of packaging complexity, allocation pressure, and the knock-on effect those constraints have on system availability. By using LPDDR and organic substrate technology, Positron is betting that a large part of the inference market does not need to mirror the exact architecture of top-end training hardware to be commercially attractive. If the workload is memory-bound and the bandwidth can be used efficiently enough, there is room for a different performance-per-dollar equation.

The company is also highlighting memory bandwidth utilisation as a design advantage, reporting figures above 90%. Whether that ultimately translates into broad competitive success will depend on software maturity, real-world deployment data, and the awkward fact that the AI infrastructure market already has deeply entrenched incumbents. Yet the technical direction is coherent. Rather than trying to out-muscle the dominant GPU vendors head-on, Positron is aiming at the space where power, cooling, and memory availability are becoming as important as absolute peak performance.

That makes the Oracle deployment more interesting than a single customer name might suggest. Cloud operators are under pressure to monetise existing power envelopes and physical plant rather than wait for every site to be rebuilt around liquid cooling and ever higher rack densities. A processor that can fit into brownfield infrastructure, remain air cooled, and avoid the worst of the HBM supply problem has a plausible route into the market, particularly for inference workloads where deployment scale matters more than chasing benchmark theatre.

The broader AI silicon market is also shifting in ways that favour that kind of approach. Training remains capital intensive and concentrated. Inference, by contrast, is fragmenting across cloud, enterprise, telecoms, and edge infrastructure, each with different constraints around latency, memory footprint, utilisation, and total cost of ownership. That opens space for more specialised architectures, provided they arrive with enough software compatibility and enough manufacturing discipline to win customer confidence.

Positron is still operating in a market dominated by much larger players. That has not changed. What has changed is that the company now has a clearer architectural position and an early infrastructure deployment that gives its next move more weight. In a sector crowded with alternatives that promise disruption but offer little operational substance, that is a better place to be than most.

Positron pushes into AI inference infrastructure

IN Brief:

Stories for you

EIZO launches rugged naval bridge display

Vector adds EV charging security tests

QNX and NVIDIA extend safety-critical edge AI stack

Positron pushes into AI inference infrastructure

SEMI forum sharpens Europe’s chip policy debate

Smartphone memory shortage deepens 2026 reset

Farnell adds Lovato range to industrial line-up

MIKROE Click boards gain full DigiKey distribution

Rochester extends Neuron processor supply beyond end of life