TAGS: |

Marvell’s OCTEON 10 Challenges All Comers For DPU Supremacy

Kurt Marko

This article was originally posted on the Packet Pushers Ignition site on July 9, 2021.

The ascendance of Software Defined Networking (SDN) has catalyzed a renaissance in specialized hardware designed to accelerate and offload workloads from general-purpose CPUs. Decoupling network transport and services via software-defined abstraction layers lets a new generation of programmable networking hardware mix packet processing and network services while offloading general-purpose CPUs. SmartNICs and their evolution into DPUs (Data Processing Units) is an area we closely follow at Packet Pushers, and while many conflate the terms, I like the NIC taxonomy proposed by my colleague Patrick Kennedy.

NVIDIA has gotten the most DPU mindshare after expanding the SmartNICs it acquired from Mellanox into the Bluefield line, but with the announcement of its OCTEON 10, Marvell takes the pole position in DPU technology.

Next-Gen Arm Cores, Cutting-Edge Process Tech Deliver An Impressive Package

Marvell’s OCTEON 10 starts with a foundation as the first DPU to use Armv9 architecture Neoverse V2 cores designed for cloud, 5G, edge, and HPC workloads. As I detailed elsewhere, the Neoverse N2 is a significant update to Arm’s mainstream efficiency-optimized (N-series) platform and includes several noteworthy improvements:

  • Updated microarchitecture with a 40 percent increase in IPC (instructions per clock cycle) over N1 with better performance-per-watt.
  • SVE2 (vector extensions) to accelerate image processing, cryptography, LTE/5G baseband processing, in-memory databases, and other applications using matrix calculations.
  • Support for up to 128 cores per SoC.
  • Memory partitioning and monitoring (MPAM) to control access to shared system resources, cache and memory bandwidth, along with other security and debugging improvements.

Source: Marvell slidedeck

Marvell further boosts performance and chip density by building the N2 cores and the surrounding SoC modules using TSMC’s 5nm N5 technology, the same process node used by Apple’s A14 (iPhone 12) and M1 (MacBook, iPad Pro) SoCs. Aside from the Arm cores, cache and standard I/Os (PCIe, DDR5), the principle subsystems on the OCTEON 10 include:

  • A vector packet processing (VPP) engine that parallelizes header lookup and decision logic to improve performance by up to five-fold.
  • In-line crypto and ML processors to accelerate cryptographic and AI algorithms.
  • An integrated 1 Tbps switch that allows various Ethernet configurations including up to 16x50G or dual 400G.
  • Support for 256-bit MACsec (L2 encryption), VXLAN, GRE,and MPLS network overlays, sFlow and IPFIX flow analytics, time-sensitive networking (TSN, critical for low-latency 5G communications), and line-rate telemetry.

The OCTEON switch pairs well with the recently-announced Alaska 1.6Tbps PHY using its 112G SerDes fabbed using the same 5nm process.

Source: Marvell slidedeck

Marvell claims that the OCTEON 10 “delivers three times the performance and 50 percent lower power compared to previous generations.” Although Marvell offered few details about product availability and pricing, it will be sampling chips later this year and will offer a PCIe 5.0 development card with a 24-core OCTEON 10, 16 GB DDR5 memory, and dual 100 GbE ports. Initial plans show four models with:

  • 8–36 N2 cores
  • 1MB L2 and 2MB L3 cache per core
  • 2–12 DDR5 controllers
  • 4-8 PCIe controllers
  • Various Ethernet configurations
  • Power budget of 10-60W

Source: Marvell slidedeck

Flexibility To Support Many Scenarios And Applications

The variety of planned SKUs illustrates a significant advantage of Marvell’s modular SoC architecture: the flexibility to scale the number and size of cores and other SoC components to match the requirements of different applications. Marvell highlighted three scenarios in its launch announcement:

  • Cloud and data center servers to offload virtual overlay and cryptographic processing for multi-tenant VM, container, and storage services.
  • LTE and 5G vRAN implementations when paired with Marvell’s Fusion-O baseband processor providing a 5G and LTE-A PHY with the OCTEON used for CU or vRAN offload processing.
  • Enterprise router-firewall and SD-WAN appliances using NFV service chaining to deliver L2/L3 forwarding, VPN termination, SPI, and new AI-based applications and security services.

Source: Marvell slidedeck

Outpacing The Competition, For Now

Although SmartNICs are nothing new, DPUs are a dynamic and rapidly evolving category. They aim to offload more than just specialized network functions, but rather entire virtual applications. Nonetheless, all of the major network component vendors have announced DPU products. Aside from internal projects with minimal public documentation like AWS Nitro, Marvell’s principal competitors are:

  • Broadcom Stingray PS1100R is overdue for an upgrade with only eight Armv8 A72 cores and single 100G or dual 25 GbE.
  • Fungible F1 features 52 MIPS64 R6 cores with hardware virtualization with up to eight 100GbE interfaces and a P4 programmable parser, DMA, and router. The F1’s versatility makes it suitable for composable infrastructure, security appliances, NVMe-oF storage systems, and edge AI applications.
  • Intel Infrastructure Processing Units (IPU) are FPGA-based products from Intel and partners (Silicom and Inventec) that span throughputs from dual 25 GbE to quad 100 GbE. The Silicom N5010 is the largest card and targets wireless carriers with a 4×100 GbE interfaces and a high-end Altera Stratix DX210 FPGA for NFV, security, and virtual overlay processing.
  • NVIDIA Bluefield-2 and Bluefield-3. Still unreleased, the Bluefield-3 promises 4×400 GbE powered by a custom SoC with up to 16 Armv8 A78 cores, 16 GB DDR5, and a host of hardware acceleration modules. For details, see my earlier Packet Pushers Ignition article on DPU-Based Smart Interfaces And The Future Of Network Functions.
  • Pensando Distributed Services Card (DSC-100) uses a custom, programmable (P4 and C) ASIC that can offload overlay/underlay tunneling, security group enforcement and line-speed (100 GbE) VPN termination, although, like the Stingray, its specs like a single 100G interface and lack of general purpose processor cores mean it can’t compete with the OCTEON 10.
  • Xilinx Alveo SN1000 is a programmable DPU that combines an UltraScale+ FPGA and 16-core NXP Arm processor (A72 cores) that offloads crypto termination (IPsec, SSL/TLS), NVMe-oF storage services, traffic shaping (QoS) and NFV services. It features dual 100 GbE interfaces

We’ve focused on the hardware, but as my previous article on NVIDIA’s Bluefield-3 discussed, the software stack is a critical part of any DPU product and where NVIDIA excels. Most of its competitors provide interfaces for low-level languages like P4, C, or an FPGA compiler (HLS, RTL) that creates a high bar for developers.

Marvell promises an open software platform for the OCTEON to facilitate development of network and security functions, VM and container applications, Linux user plane extension, and DPDK functions. Hopefully Marvell and its OEMs can match the features of NVIDIA DOCA, Morpheus, and GPUDirect because it would be a pity to waste the OCTEON 10’s incredible capabilities.

About Kurt Marko: Kurt was an IT analyst, consultant and regular contributor to a number of technology publications including Diginomica, TechTarget and AvidThink. Starting his career as an electrical engineer, Kurt spent the past 35 years providing deep reporting and analysis in networking and IT. Kurt passed away in January 2022.