Microsoft Takes "Shock And Awe" Approach To New Azure Custom And Merchant Silicon For AI And General Purpose Workloads

Microsoft Takes "Shock And Awe" Approach To New Azure Custom And Merchant Silicon For AI And General Purpose Workloads

At #MSIgnite, Microsoft went all "shock and awe" in one of the biggest displays I have seen from a CSP (Cloud Service Provider) in a while.

Custom-silicon:

  1. Microsoft Azure Maia AI Accelerator: This is designed for ML and GAI training and inference. You have to assume today's and next-generation OpenAI models will be trained and inferenced on Maia. Sam Altman says this was a "co-collaboration" to produce "more capable" and "cheaper" models.

Liquid-cooled Maia
  • Racks are liquid-cooled

  • 4x cards per server

  • ASIC, not a GPU (as expected)

  • no cluster or model size limit

  • "X86 host" (unclear if AMD or Intel)

  • TSMC 5nm (assuming high performance)

  • supports standard MX sub-8 bit data types

  • Ethernet connectivity (embedded by the way)

  • power Microsoft Copilot or Azure OpenAI Service

2/ Microsoft Azure Cobalt CPU: Arm-based SoC for general-purpose computing.

Arm-based Cobalt 100
  • "up to 40% perf/core versus previous Arm server"

  • 128 cores

  • 12 DDR 5 channels

  • Arm Neoverse N2-based core

  • Microsoft for security and special power management

Overall Custom details:

  • Custom Approach: full stack, including rack, fleet, and cooling optimizations like liquid network; did all chip and SoC engineering to tape out

  • Custom Drivers: supply chain resilience, cost, highest performance

  • Custom Timeframe: early next year

  • Custom History: Custom console silicon in the 2010s, then Cerberus, then Azure Boost, and now Maia and Cobalt.

New merchant silicon:

  1. NC H100 v5 VM Preview: NVIDIA AI H100 for mid-range training and GAI inference. Will add H200 next year.

  2. ND MI300 VM: AMD Instinct MI300X-based VM.

This announcement on the custom silicon was more than I had imagined. Included both custom general compute and ML/GAI training and inference. Microsoft Azure is smart to roll this out to SaaS & PaaS first, followed by IaaS for everybody, as it lowers risk.

I wanted to see a vertical approach, and this effort was a ground-up, full-stack effort. I need help figuring out how more of this didn't leak as the company says it went all the way from IP and SoC design to tape-out to test working directly with TSMC; hence no SoC integrators as it did with console SoCs like AMD and as others do with Samsung Semiconductor custom design services and Marvell Technology custom capabilities.

Kudos for bringing out the MI300X, as it is GPU-based. AMD has been the "next in line" for big business to NVIDIA. I hear about AMD literally everywhere. This is one of the first of many AMD MI300X announcements.

Looking forward to further information on:

  • Performance versus merchant AMD and Intel and custom AWS and Google Cloud custom silicon.

  • Pricing for custom and AMD MI300X silicon

  • dates for full IaaS and SaaS apps like M365, D365, etc.

  • Official GA dates

Overall, I was very impressed and didn't expect all this at once.

Tickers: $MSFT $NVDA $AWS $GOOGL $INTC $AMD

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics