09/20/2024 | Press release | Distributed by Public on 09/19/2024 16:45
Did you know that optical transceivers, which convert signals between electrical and optical domains for transmission through fibre optic cables, consume a significant portion of system power at higher data rates?
For example, 800G medium reach optics could consume 30 to 40% of a 28.8Tbps switch's power. This power usage adds up in large data centres (DCs) with hundreds of switches connecting thousands of GPUs and front-end servers.
The transceiver on the transmit side contains a Retimer / Digital Signal Processor (DSP) followed by an optical engine that converts electrical signals to the optical domain. The Retimer / DSP extracts the signal / clock from the electrical interface and applies pre-compensation techniques to counteract the effects of impairments that the signal might encounter during transmission. This helps to preserve signal quality over long distances. Similarly, on the receiving side, the optical engine converts the optical signals to the electrical domain. It feeds them to a DSP circuitry, which applies equalization algorithms to correct these distortions, restoring the signal's integrity.
DSP circuitry consumes about 50% of the transceiver power. By beefing up the host ASIC's SerDes with some DSP functions and removing the DSP circuitry completely from optics, we can reduce the optics power by 50% and overall system power for driving the signals by 25%. These non-DSP optics are called Linear-drive Pluggable Optics (LPOs). There are some challenges with this approach:
Linear Receiver Optics
A new technology - Linear Receiver Optics (LRO) - got some traction (YouTube) at the recent OFC 2024 conference. These transceivers have standard DSP circuitry on the transmit side. This ensures compliance with industry standards like IEEE 802.3 and enables optics to drive longer distances at higher rates. However, they eliminate the DSP on the receiving side. This offers a middle ground for power efficiency, latency, and performance. LROs have the potential to scale to >800Gbps speeds. Several companies demonstrated 800Gbps LRO at the OFC 24.
Co-Packaged Optics
Co-Packaged Optics (CPO) technology refers to the integration of optical transceivers directly into the ASIC package of the host chip, with fibres coming out of the package. Fibre optic cables can connect to these through specialized connectors. This is made possible by silicon photonics, which integrates all optical components of the transceiver, except the laser, into a single CMOS Photonic Integrated Circuit (PIC). The PIC is bonded to the Electronic Integrated Circuit (EIC) that contains drivers and amplifiers of the transceiver and the host ASIC using advanced packaging techniques.
Advantages of CPOs:
However, there are several challenges!
This year, discussions at Hot Chips 2024 focused on the Optical Compute Interface (OCI), which refers to the use of CPOs inside GPU or CPU packages to build scale-up systems. This integration presents challenges, mainly as GPUs often use 2.5D packaging for High Bandwidth Memory (HBM). Broadcom demonstrated this technology with a test chip that emulates the GPU and an HBM on a silicon interposer with a PIC/EIC mounted directly on the substrate.
At OFC 2024, Intel demonstrated a 4Tbps OCI chiplet with an integrated laser in the package, with data rates of 64x32Gbps (2Tbps each way). Conversely, Broadcom's optical engine chiplet has 6.4Tbps capability (64x100Gbps) with the laser positioned externally as a pluggable module, using the fibre to transmit light for modulation inside the ASIC.
Jensen Huang's talk at GTC 2024 highlighted 20% power savings using 200Gbps copper cables (instead of optical cables) for the backplane of the NVL72 chassis to connect the Blackwell GPUs and NVLink switches. OCI may not be power or cost-efficient for intra-rack links below 200Gbps speeds. However, it may be a good option for link speeds > 200Gbps and for systems spanning multiple racks where cable lengths can exceed 7-8m. It will be interesting to see if Nvidia / AMD will adopt OCI in their GPUs or transition first to LPO / LROs as those do not need complex ASIC packages?
These proofs of concept are compelling and, when mature, can enable large-scale systems. But will this transition happen?
However, both LPO and LRO are positioned to significantly impact the future of optical networking by offering power and cost savings. This makes it attractive for high-speed, high-volume applications inside DCs. It will be interesting to watch how soon they become mainstream!
Sharada Yeluri is a Senior Director of Engineering at Juniper Networks, where she is responsible for delivering the Express family of Silicon used in Juniper Networks' PTX series routers. She holds 12+ patents in the CPU and networking fields.
Adapted from the original posts on LinkedIn (1, 2).
The views expressed by the authors of this blog are their own and do not necessarily reflect the views of APNIC. Please note a Code of Conduct applies to this blog.