IN Brief:
- Lumai Iris Nova is designed to run billion-parameter AI inference workloads using optical matrix multiplication.
- The system targets power-constrained data-centre environments, with Lumai citing up to 90% lower energy use than conventional architectures.
- The launch adds another architecture to the AI accelerator market as inference workloads grow in scale and energy demand.
Lumai has introduced Iris Nova, the first generation of its Iris optical inference server platform, as the UK startup moves its lens-based computing architecture toward commercial evaluation in data-centre environments.
The system is designed to run billion-parameter large language models in real time, including Llama 8B and 70B workloads, using optical matrix multiplication rather than a purely transistor-based accelerator architecture. Lumai says the system is available for inference performance evaluation and is intended to fit conventional rack infrastructure rather than requiring a wholesale redesign of the data-centre stack.
Iris Nova forms part of a wider product family that also includes Iris Aura and Iris Tetra, higher-throughput platforms aimed at power-constrained inference deployments. The company has highlighted a three-tier memory architecture for key-value cache handling, a compact optical engine capable of matrix multiplication up to 2048 × 2048, and INT4/INT8-equivalent precision for AI inference workloads.
The architecture uses three-dimensional optical parallelism to increase throughput while reducing power draw. Lumai is targeting up to 100 TOPS/W and operation within a 10kW power envelope, placing the system inside the current engineering debate around AI infrastructure power density, cooling, and total cost of ownership.
Optical computing is being applied here as a heterogeneous accelerator rather than as a complete replacement for digital silicon. Digital processing still handles control, data movement, system integration, and software workflows, while the matrix-heavy part of inference is shifted into an optical domain where parallel operations can be performed through light propagation.
AI inference is now a primary driver of data-centre capacity planning. Training generates the largest bursts of compute demand, but inference creates continuous operational load as deployed models respond to live requests. That makes energy efficiency, memory bandwidth, and data movement central design constraints rather than secondary optimisation targets.
Conventional GPU and accelerator vendors have responded with high-bandwidth memory, chiplet architectures, denser packaging, and more aggressive power management. Lumai is approaching the same bottleneck through a different computing medium. The engineering test will be whether the optical core, data conversion, memory access, precision management, and manufacturability can operate together at useful scale.
Compatibility with existing data-centre environments will be a critical part of that evaluation. New compute architectures struggle when they require new software assumptions, facility layouts, or operational processes. A server that can be assessed in familiar infrastructure can be judged on workload performance, energy use, and deployment fit rather than treated as a standalone research system.
Optical AI acceleration is unlikely to displace digital processors across the full AI pipeline. Its nearer-term role is more specific: high-volume inference where matrix operations dominate and the power budget is tight enough to justify architectural change. If Iris Nova proves its performance outside controlled demonstrations, it will add a serious option to a processor market already fragmenting around GPUs, AI ASICs, NPUs, and application-specific accelerators.



