Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale

Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale

Over the past few decades, large hyperscale data centers have revolutionized global-scale Internet services, facilitating the growth of web search, e-commerce, social media platforms, and the public cloud. Recently, machine learning (ML) applications and workloads have further underscored the importance of these large-scale computing capabilities, driving advancements in fields ranging from biology and medicine to enhancing Internet services. Datacenter networking delivers the necessary interconnectivity and scale to execute these functions and services efficiently.

While industrial best practices have focused on pure packet solutions using Clos topologies for large-scale data center networks, the research community has developed innovative network designs incorporating optical circuit switches (OCSes). These designs leverage the dynamism and flexibility that electrical packet switch (EPS) )-only networks lack. Traditional networks use a "Clos" topology, also known as a spine and leaf configuration, to connect all servers and racks within a data center. In a spine and leaf architecture, compute resources--racks of servers equipped with CPUs, GPUs, FPGAs, storage, and/or ASICs--are connected to leaf or top-of-rack switches, which then connect through various aggregation layers to the spine.

Traditionally, the spine of this network uses Electronic Packet Switches (EPS), which are standard network switches provided by companies like Broadcom, Cisco, Marvell, and Nvidia. However, these EPS consume a significant amount of power.

Apollo is believed to be the first large-scale deployment of optical circuit switching (OCS) for data center networking. The Apollo OCS platform includes a homegrown, internally developed OCS (Palomar), circulators, and customized wavelength-division-multiplexed (WDM) optical transceiver technology that supports bidirectional links through the OCS and circulators. Apollo has served as the backbone of all Google data center networks, having been in production for nearly a decade, supporting all data center use cases.

Incorporating the Apollo OCS layer replaces the spine blocks, resulting in significant cost and power savings by eliminating the electrical switches and optical interfaces used in the spine layer. Google uses these optical switches in a direct connect architecture to link the leaves through a patch panel. This method is not packet switching; it functions as an optical cross-connect.

The OCS-based network introduces significant flexibility to a typically static Clos-based networking fabric. While the OCS layer could be inserted in different parts of the network (parallel to Spine blocks, between TOR switches and AB, or parallel with AB),

Figure 1a) shows a traditional data center network architecture, with Spine blocks connecting Aggregation Blocks (ABs). Top-of-rack (TOR) switches are interconnected to the AB using parallel-fiber-based optical transceivers, either short-reach (SR) multi-mode or parallel single-mode (PSM), along with the corresponding fibers.

Evolution of Network Architecture.

For inter-AB connections, they employ single-mode WDM optical transceivers. WDM optics enhance the efficiency and utilization of OCS ports more than a PSM solution. Circulators are coupled with the optical transceivers to enable the bidirectional operation of these single-mode optical links, achieving full duplex communication for each fiber strand an OCS port. This approach halves the number of required fibers and OCS ports.

Optical Circuit Switching for Data Centers

Optical circuit switching (OCS) for communications has undergone intense investigation and development, resulting in various optical devices with different physical mechanisms for implementing the switching function. MEMS-based systems have demonstrated the most promise for scaling to the large port counts required for data center applications at acceptable costs, achieving systems with greater than 1000x1000 interconnectivity.

While piezo-based systems have higher costs due to assembly complexity and guided wave switching has limited scalability and high losses, they offer the lowest costs and size due to integration. Wavelength-switching-based schemes have been extensively investigated for faster optical switching, primarily focusing on optical packet switching (OPS) and optical burst switching (OBS).

Critical Advantages of MEMS-Based OCS

Data Rate and Wavelength Agnostic: The MEMS-based OCS deflects light from the input port to the desired output port using two arrays of mirrors that can tilt about two axes, enabling three-dimensional steering, as shown in Figure 4. Due to the broadband, passive nature of the optical path, the same OCS hardware can support increased line rates and additional per-port multiplexed wavelengths. This allows the reuse of the OCS across multiple generations of optical transceiver technologies, limited only by the capacity/bandwidth of the single-mode fiber and associated components interfacing with the mirrors.

Low Power Consumption: Without per-packet processing, per-bit energy consumption is significantly lower than that of EPS counterparts. MEMS mirrors are electrically equivalent to a capacitive load, requiring minimal power to maintain positions. A well-designed high-voltage driver circuit can consume only tens of milliwatts of power per mirror and port.

Low Latency: The absence of per-packet processing minimizes added latency, which is primarily determined by speed-of-light propagation delay (~5ns per meter in optical fiber and ~3.3ns per meter in free space). In comparison, an equivalent throughput EPS would add significantly more delay per network hop.

Despite these favorable characteristics, data center economics, scale, and performance requirements impose several challenges for OCS hardware:

  • Faster Switching Time: Commercial OCS switching times are typically 10-20ms, constrained by control software and mirror configuration time. Receiver initialization of optical transceivers also requires burst mode operation for faster switching.

  • Lower Insertion and Return Loss: The optical link budget is crucial for data center network fabrics, significantly impacting the cost and power consumption of optical transceiver technology.

  • Wide Wavelength Range Operation: To scale beyond multi-Tb/s, data center optical interconnects may need a large channel count WDM, requiring operation across multiple communication bands (O, S, C, L).

  • Non-blocking: A strictly non-blocking switch is desirable to maximize flexibility, minimize network disruption, and reduce the burden on controls and software.

  • Lower Cost: Historically, the cost of integrated MEMS-based OCS has been a barrier to data center use. However, the underlying chip technology is cost-effective due to silicon wafer fabrication. Increased demand from data centers can drive down costs, similar to the reduction seen with Ethernet switch ASICs. We aim to allocate 15% of network costs to support optical switching.

Google claims to have addressed the four major disadvantages of Optical Circuit Switching (OCS): high upfront cost, insertion loss, reconfiguration time, and lack of drop-in support. With their Apollo project, Google developed a non-blocking 136x136 OCS that is forward and backward-compatible with any bandwidth or wavelength used in their data centers. Remarkably, this switch consumes only 108 watts compared to the 3,000 watts typically used by a standard 136-port Electronic Packet Switch (EPS).

To meet the evolving and growing network requirements, future hardware developments in optical technologies should focus on:

1. Larger Port Count OCS: Enables further scale-out with increased striping and improves efficiency through more flexible topology engineering.

2. Faster Switching Speed and/or Smaller Radix, Lower Cost OCS: Facilitates adoption in lower layers of the data center network for shorter, more bursty traffic flows (e.g., TOR to AB traffic) or flexible bandwidth provisioning by adjusting TOR oversubscription ratios.

3. Improved Reliability and Availability: Essential for larger OCS/failure domains and applications requiring higher uptime.

4. Lower Insertion and Return Loss: Extends the optical interconnect roadmap for continued low power consumption and reduced transceiver costs.

References:

[1] https://arxiv.org/abs/2208.10041

Urata, R., Liu, H., Yasumura, K., Mao, E., Berger, J., Zhou, X., Lam, C., Bannon, R., Hutchinson, D., Nelson, D., Poutievski, L., Singh, A., Ong, J., & Vahdat, A. (2022). Mission Apollo: Landing Optical Circuit Switching at Datacenter Scale. ArXiv. /abs/2208.10041

[2] https://www.semianalysis.com/p/google-apollo-the-3-billion-game

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics