Cisco AI
Cisco UCS X-Series for AI/ML: X9508, H100, Intersight Design (2026)
The Cisco UCS X-Series Modulad System, with its X9508 chassis and Intersight-managed infrastructure, represents a significant shift from the established UCS B-Series and UCS 5108 architecture. For AI/ML workloads, this evolution is critical, primarily due to the X-Fabric's ability to host high-power, high-bandwidth components like NVIDIA H100/H200 GPUs. This article dissects the X-Series' suitability for demanding AI environments, focusing on the X9508, compute nodes like the X210c M7 and X410c M7, and the GPU-specialized X440p compute node. We'll detail the architectural advantages, networking with 6536 Fabric Interconnects, and the operational paradigm under Intersight, including brownfield migration considerations and Day-2 operations.
UCS X-Series X9508 Chassis and X-Fabric Architecture for AI
The UCS X9508 chassis is the foundational element, supporting up to eight compute nodes and operating without a traditional midplane. Instead, it utilizes the innovative X-Fabric Technology, which provides direct, high-speed connectivity from each compute node slot to either the 6536 Fabric Interconnects (FIs) or other X-fabric modules. This eliminates the bandwidth limitations and single points of failure inherent in older midplane designs. For AI, this means unconstrained PCIe Gen5 lanes to GPUs and high-throughput network interfaces. The X9508 supports up to 6U; 7U is gained by removing the IMM. The power subsystem, often overlooked until brownfield deployment, is crucial here. Each X9508 chassis, especially when fully populated with X440p nodes leveraging NVIDIA H100/H200 GPUs, can draw upwards of 12-14 kW. This necessitates careful planning for rack PDU selection (e.g., APC AP8867 for 208V/30A or AP8981 for 400V/32A 3-phase) and datacenter power density, typically requiring hot-aisle containment or liquid cooling within a 48U rack, given the thermal envelopes of modern AI accelerators.
Unlike the UCS 5108 blade chassis which relied on Fabric Extenders (FEXs) like the 2204XP or 2208XP connected to upstream Fabric Interconnects (e.g., 6248UP, 6332-16UP), the X9508 leverages direct connectivity to the 6536 Fabric Interconnects. Each X9508 node, such as the X210c M7 or X410c M7, connects via multiple 25Gbps or 100Gbps links directly to the 6536 FIs. The X-Fabric module, specifically the Cisco VIC 14425/15420/15422, provides the network and storage connectivity for each node. For GPU-intensive X440p nodes, the PCIe Gen5 lanes are critical. The X-Fabric design ensures that each X440p node can fully utilize its PCIe 5.0 lanes to its onboard GPUs, bypassing any potential midplane bottlenecks and delivering the low-latency, high-bandwidth needed for GPU Direct RDMA (Remote Direct Memory Access) and NVLink/NVSwitch topologies within multi-GPU nodes.
Compute Node Options: X210c, X410c, and X440p for AI
The UCS X-Series offers several compute node options, each tailored for different workload profiles. The X210c M7 (2 sockets, balanced configuration) and X410c M7 (4 sockets, higher core count/memory capacity) are based on Intel Xeon Scalable 4th Gen (Sapphire Rapids) or 5th Gen (Emerald Rapids) processors, providing a robust general-purpose compute foundation. However, for serious AI/ML, the X440p PCIe Node is the workhorse. This node specifically accommodates up to four double-width PCIe Gen5 GPU accelerators like the NVIDIA H100 Tensor Core GPU, H200 (Hopper with HBM3e), or L40S. The X440p provides a high-bandwidth PCIe Gen5 fabric directly to these GPUs, ensuring optimal data transfer rates between CPU and GPU, which is paramount for iterative training loops and large model inference.
The decision between CPU variants (e.g., Xeon Platinum 8480+ vs. 8580) and GPU types (H100 vs. A100 vs. L40S) depends directly on the specific AI workload. For large language model (LLM) training and inference, H100/H200 are preferred for their Tensor Core performance and HBM bandwidth. For graphics rendering or smaller inference tasks, L40S might be more cost-effective. Each X440p with H100s can draw an immense amount of power, often 4-5 kW per node. This is a critical factor impacting rack sizing, cooling, and power distribution units (PDUs). A typical 42U rack might only house two fully-populated X9508 chassis with X440p nodes due to these constraints. The internal NVLink bridges or NVSwitch fabrics within the X440p ensure that direct GPU-to-GPU communication bypasses the CPU and server's PCIe fabric for inter-GPU data exchange, minimizing latency for collective operations in distributed training. This is a significant advantage over architectures where GPU-to-GPU communication must traverse the CPU's primary PCIe complex.
Networking with 6536 Fabric Interconnects (FIs)
The 6536 Fabric Interconnects are the cornerstone of the X-Series' high-performance network. These FIs feature 100Gbps and 400Gbps ports, providing the necessary bandwidth for AI workloads. Each X9508 chassis typically connects to a pair of 6536 FIs for redundancy, using multiple 100Gbps links from the VICs in the compute nodes. For AI, especially distributed training, a flat, high-bandwidth Ethernet network is essential. RoCEv2 (RDMA over Converged Ethernet v2) is often implemented over these FIs to provide low-latency communication between compute nodes and storage, crucial for parallel file systems like BeeGFS or Lustre, or high-performance NFSv4.1 deployments leveraging pNFS.
The 6536 FIs operate in "switching mode" managed by Intersight Managed Mode (IMM), as opposed to the traditional UCS Manager (UCSM) domain mode. This simplifies networking by directly integrating the FI management into Intersight, eliminating the need for separate UCSM clusters. The 6536 FIs provide a total of 32x 100Gbps ports and 16x 400Gbps ports, or various aggregations. For AI/ML, these ports are typically uplinked to a high-performance spine/leaf architecture utilizing Nexus 9300-FX3 or 9500 series switches. Bandwidth allocation becomes critical; a 64-GPU H100 pod, for example, would require significant aggregate bandwidth to storage and between nodes. The following configuration snippet illustrates a 6536 port configuration:
Fabric-Interconnect # scope ethernet interface breakout 1/9
Fabric-Interconnect /ethernet/interface/breakout # create breakout-ports 4x100g
Fabric-Interconnect /ethernet/interface/breakout # commit-buffer
Fabric-Interconnect # scope fc uplinks
Fabric-Interconnect /fc/uplinks # create uplink-port 1/17
Fabric-Interconnect /fc/uplinks # commit-buffer
Fabric-Interconnect # show fabric-interconnect inventory
Chassis:
Name: FI-A
Model: N6K-C6536
Serial: FoxXXXXXX
Firmware: 6.2(6)
Status: Up
... (truncated)
For brownfield deployments, migrating from an existing UCSM domain to IMM requires a complete re-provisioning of the FIs and servers. This is not an in-place upgrade. Careful planning of network VLANs, storage vSANs, and IP addressing is essential to avoid service disruption. It's often recommended to build a completely separate X-Series based IMM domain and then migrate workloads, or deploy a greenfield environment dedicated to AI.
Intersight Management: Server Profiles and GPU Policies
Intersight is the cloud-native management platform for UCS X-Series. All aspects of server configuration, from BIOS settings to network and storage connectivity, are managed through Intersight profiles. This provides consistent, declarative infrastructure as code. For AI workloads, custom server profiles are essential, often including specific BIOS tunings for maximum CPU and PCIe performance, such as disabling C-states, enabling turbo boost, and configuring memory interleave. A dedicated GPU Policy within Intersight allows fine-grained control over GPU settings, including power limits, fan speeds, and even NVIDIA Virtualization features (vGPU) when using hypervisors like VMware vSphere.
A typical Intersight Server Profile for an AI/ML node will include: a Host Firmware Package (ensuring consistency across nodes), a Boot Order Policy (often SAN boot for OS), a LAN Connectivity Policy (defining vNICs, VLANs, QoS for RoCEv2), a SAN Connectivity Policy (for Fibre Channel or iSCSI LUNs), and crucially, a GPU Policy. For NVIDIA H100 GPUs, the GPU Policy specifies whether the GPUs are allocated for a single owner (passthrough) or segmented for vGPU usage. When using vGPU profiles, careful validation with NVIDIA AI Enterprise software stack is required.
--- # Intersight Server Profile for AI/ML Node
apiVersion: v2.0.0/server.serverprofile
kind: ServerProfile
name: my-ai-node-profile
description: "Server Profile for AI/ML X440p compute nodes"
tags:
- Key: environment
Value: production
organizations:
- Moid: "5cc235d6426f4327a3c7cf4c"
clusterModeControl: "None"
srcMemoryTiering: "None"
serverAssignmentMode: "Pool"
serverPool:
Moid: "65a6f8b1c4bdc6dd10d0c3e7"
bootOrderPolicy:
Moid: "65a6f8ddc4bdc6dd10d0c5d6"
resourcePools:
- Moid: "65a6f8b1c4bdc6dd10d0c3e7"
hostFirmwarePackage:
Moid: "65a6f874c4bdc6dd10d0bf9a"
iosPolicy:
Moid: "65a6f882c4bdc6dd10d0c0d1"
lanConnectivityPolicy:
Moid: "65a6f8dac4bdc6dd10d0c53d"
sanConnectivityPolicy:
Moid: "65a6f8dec4bdc6dd10d0c5d7"
powerPolicyType: "Performance"
virtPlatformEvacuationPolicy: "NoAction"
virtPlatformPolicy:
HypervisorType: "VMwareEsxi"
EsxiPolicy:
Moid: "65a6f8dac4bdc6dd10d0c53d"
gpuPolicy:
Moid: "65a6f90bc4bdc6dd10d0c8d1"
# GPU Policy Example (assuming separate object for definition)
apiVersion: v2.0.0/server.gpupolicy
kind: GpuPolicy
name: nvidia-h100-passthrough
description: "NVIDIA H100 GPU passthrough policy"
tags:
- Key: gpu_type
Value: h100
organizations:
- Moid: "5cc235d6426f4327a3c7cf4c"
gpuType: "NVIDIA"
gpuMode: "Passthrough"
vgpuProfileName: ""
driverInstallMethod: "HostInstall"
Automating deployment and Day-2 operations with Intersight is key. Ansible, using the cisco.intersight collection, can manage server profiles, network policies, and other infrastructure components. This ensures repeatability and reduces human error, critical for scaling AI infrastructure. Here's a snippet for an Ansible playbook using the Intersight collection:
---
- name: Configure UCS X-Series Server Profile for AI
hosts: localhost
connection: local
collections:
- cisco.intersight
tasks:
- name: Ensure AI Server Profile exists
intersight_server_profile:
api_key: "{{ intersight_api_key }}"
secret_key_filepath: "{{ intersight_secret_key_filepath }}"
validate_certs: false
state: present
name: my-ai-node-profile
description: "Server Profile for AI/ML X440p compute nodes"
organizations:
- Moid: "5cc235d6426f4327a3c7cf4c"
host_firmware_package:
Moid: "65a6f874c4bdc6dd10d0bf9a"
boot_order_policy:
Moid: "65a6f8ddc4bdc6dd10d0c5d6"
lan_connectivity_policy:
Moid: "65a6f8dac4bdc6dd10d0c53d"
san_connectivity_policy:
Moid: "65a6f8dec4bdc6dd10d0c5d7"
gpu_policy:
Moid: "65a6f90bc4bdc6dd10d0c8d1"
server_assignment_mode: "Pool"
server_pool:
Moid: "65a6f8b1c4bdc6dd10d0c3e7"
- name: Power on the assigned server
intersight_server_profile:
api_key: "{{ intersight_api_key }}"
secret_key_filepath: "{{ intersight_secret_key_filepath }}"
validate_certs: false
state: present
name: my-ai-node-profile
power_state: "PowerOn"
AI PODs, Storage, and Software Stacks
Cisco Validated Designs (CVDs) for AI/ML often define specific hardware and software bundles, branded as AI PODs. These designs integrate UCS X-Series with NVIDIA GPUs, high-performance storage, and AI software platforms. Common stacks include NVIDIA AI Enterprise, Red Hat OpenShift AI (formerly Open Data Science), or VMware Private AI Foundation. The choice of storage is paramount: pure NVMe-oF or high-performance NFSv4.1 over RoCEv2. NetApp AFF A900 controllers with All-Flash SAN array (e.g., FC or iSCSI for boot, NFS for data) or Pure Storage FlashBlade //E for high-throughput NFS with pNFS are popular choices due to their ability to deliver sustained IOPS and bandwidth for large datasets. FlashBlade //E, in particular, offers object and file services optimized for modern workloads.
| Feature | UCS X-Series with H100 | Traditional Rack Server (e.g., C240 M7) with H100 |
|---|---|---|
| GPU Density/Node | Up to 4x Double-width H100s (X440p) | Usually 4-8x Double-width H100s |
| Chassis Density | 8 Nodes (up to 32 H100s per X9508 chassis) | Per-server, no chassis pooling |
| Management | Intersight (Cloud-native, Policy-driven) | CIMC (Per-server), Intersight for basic monitoring |
| Networking | Dedicated Cisco VICs, X-Fabric for PCIe Gen5, 100/400Gbps FI | Dedicated NICs, PCIe Gen5 on motherboard |
| Cabling Complexity | Reduced: X-Fabric and FI consolidate NIC/HBA cables | High: separate cables for each NIC, HBA, management |
| Power/Cooling | Centralized within X9508, up to 14kW/chassis | Distributed per server, typically 3-5kW/server |
| Scalability | Modular, compute/GPU nodes swapped independently | Scale-out by adding individual servers |
| TCO (estimated) | Higher initial CAPEX, lower OPEX via Intersight automation | Lower initial CAPEX, potentially higher OPEX without automation |
CVD AI Pod Sizing Example: 64 H100 GPUs
A typical 64-GPU H100 AI Pod would consist of: 16 X440p compute nodes, each populated with 4 NVIDIA H100 GPUs. These 16 nodes would require 4 Cisco UCS X9508 chassis (4 nodes per chassis for power/cooling headroom, though up to 8 is possible without 4xH100). The chassis connect to a pair of 6536 Fabric Interconnects. Storage would be provided by a NetApp AFF A900 (e.g., 2 controllers, 24x 15.3TB NVMe SSDs) or a Pure FlashBlade //E (e.g., 5-7 blades) for data, connected via 100Gbps RoCEv2. The network fabric would include 2-4 Nexus 9300-FX3 or 9500 switches acting as leaves, uplinking to a spine layer. Software like Red Hat OpenShift AI or VMware vSphere with Tanzu for private AI would orchestrate workloads. Power requirements for 4 chassis (64 H100s + CPU/RAM overhead) could exceed 50kW, demanding specialized data center infrastructure and advanced cooling. For VM-based GPU workloads (vGPU), ESXi 8 U3 with NVIDIA vGPU drivers and MLNX-OFED drivers on the guest OS (e.g., RHEL 9) are essential. When utilizing Passthrough (DirectPath I/O) for bare-metal or single-VM dedicated access, the GPU is directly mapped to the VM, bypassing hypervisor intervention for maximum performance, though at the cost of flexibility.
FlexPod for AI vs. FlashStack for AI
Two prominent converged infrastructure solutions leveraging UCS X-Series for AI are FlexPod for AI (Cisco & NetApp) and FlashStack for AI (Cisco & Pure Storage). Both offer validated designs with predictable performance.
- FlexPod for AI: Leverages NetApp ONTAP and AFF storage. Offers a mature, feature-rich storage platform with robust data management capabilities, including snapshotting, replication, and data tiering. Ideal for organizations already invested in NetApp or requiring sophisticated data lifecycle management for their AI datasets. NetApp's filesystem optimizations, especially with pNFS and direct integration with NVIDIA AI Enterprise, make it a strong contender.
- FlashStack for AI: Utilizes Pure Storage FlashBlade //E. FlashBlade is purpose-built for unstructured data and high-performance file/object workloads, making it exceptionally well-suited for AI/ML. Its all-flash, scale-out architecture delivers low-latency, high-bandwidth access to large datasets, which can be a critical performance differentiator for certain AI training jobs. FlashBlade's simplicity of management and high density are also significant advantages.
The choice between them often comes down to existing vendor relationships, specific storage features required (e.g., SnapMirror vs. ActiveCluster), and budget. Both provide reference architectures for deploying NVIDIA AI Enterprise, ensuring validated performance and supportability.
Day-2 Operations, Troubleshooting, and Failure Modes
Day-2 operations with UCS X-Series under Intersight primarily involve monitoring, patching, and scaling. Intersight's dashboard provides comprehensive visibility into server health, power consumption, and network statistics. Firmware updates are managed via Host Firmware Packages in Intersight, allowing for scheduled, non-disruptive updates. Scaling out compute often means adding more X9508 chassis or populating existing ones with additional compute nodes, then assigning pre-configured server profiles.
Troubleshooting within the X-Series often starts with Intersight's alerts and fault logs. Common issues in AI environments include power over-consumption (resolved by power capping or redistributing loads), network congestion (monitored via FI port statistics, e.g., show interface ethernet 1/17 transceiver details), and GPU thermal throttling (addressed by adjusting Intersight GPU policies or datacenter cooling). A frequently encountered failure mode in high-density GPU deployments is local power supply failure within a single compute node, often triggered by a sudden spike in GPU load. Intersight will flag this immediately, and the affected node can often be recovered by simply replacing the PSU hot-swappably while other nodes in the chassis continue operating. Fabric Interconnect failures (less common due to redundant design) would result in a loss of network connectivity to half the compute nodes, requiring a failover to the secondary FI, which for AI training can be disruptive if RDMA traffic state is lost.
For network troubleshooting, direct CLI access to the 6536 FIs (after logging into the Fabric Interconnect via Intersight or SSH) is indispensable. Commands like show interface transceiver details, show system health, and show module provide deep insights into physical layer issues, optics health, and module status, mirroring a traditional NX-OS environment.
Verdict
For high-performance AI/ML infrastructure in 2026, the Cisco UCS X-Series with X440p compute nodes and NVIDIA H100/H200 GPUs managed by Intersight is the clear architectural winner for large-scale, datacenter-deployed AI. Its X-Fabric design provides unparalleled PCIe Gen5 bandwidth and direct I/O to accelerators, crucial for eliminating bottlenecks. The Intersight management paradigm simplifies operations and automation for complex AI environments. For organizations prioritizing automation, density, and validated designs, the X-Series is superior to disparate rack servers. When combined with a Pure FlashBlade //E for data-intensive workflows or a NetApp AFF A900 for comprehensive data management, the solution provides an end-to-end compute and storage platform capable of hosting the most demanding AI workloads. The trade-off is often a higher initial CAPEX compared to building with commodity rack servers, but this is offset by lower OPEX through streamlined management and superior performance per Watt/Rack Unit.
Related reading
- Fortinet FortiGate 7.6 NGFW Design: Enterprise Edge Reference Architectures
- Palo Alto Networks Strata vs. Prisma: Integrated Security Platform Decisions 2026
- Cisco Catalyst SD-WAN: Multicloud Integration with Azure and AWS
- Nexus 9000 ACI: Brownfield Migration Strategies & TCO Implications
- Liquid Cooling for High-Density Racks: Datacenter Retrofit Strategies
Frequently asked questions
What is the primary advantage of Cisco UCS X-Series over UCS B-Series for AI workloads?+
The primary advantage is the X-Fabric Technology, which replaces the traditional midplane with direct PCIe Gen5 connectivity. This allows for significantly higher bandwidth and lower latency to internal components like NVIDIA H100 GPUs within the X440p nodes, bypassing bottlenecks inherent in the older shared midplane design of the UCS B-Series. It also enables higher power delivery per slot.
Can I upgrade my existing UCS B-Series infrastructure to UCS X-Series for AI?+
No, it's not a direct upgrade. The UCS X-Series, managed by Intersight Managed Mode (IMM), operates on a fundamentally different architecture from UCS B-Series managed by UCS Manager (UCSM). Migrating requires deploying a new UCS X-Series domain and then migrating workloads, rather than an in-place conversion of hardware or management. This is typically a greenfield deployment for AI workloads.
What are the power and cooling considerations for a fully populated UCS X9508 with H100 GPUs?+
A fully populated UCS X9508 chassis with X440p nodes, each containing four H100 GPUs, can draw 12-14 kW. This significantly exceeds typical rack power densities. It necessitates careful planning for high-density rack PDUs (e.g., 3-phase 400V/32A), potentially requiring hot-aisle containment or even direct-to-chip liquid cooling solutions for the GPUs and associated rack infrastructure. Standard datacenter cooling might be insufficient.
What GPU virtualization options are available for X-Series with NVIDIA H100 GPUs?+
With VMware vSphere ESXi 8 U3 and NVIDIA vGPU software (part of NVIDIA AI Enterprise), you can configure H100 GPUs for either full passthrough (DirectPath I/O) to a single VM for maximum performance or use NVIDIA's vGPU profiles to partition a single physical H100 into multiple virtual GPUs, allowing multiple VMs to share the GPU's resources. Intersight GPU policies manage these configurations.
How does Intersight simplify management compared to traditional UCS Manager for AI infrastructure?+
Intersight offers cloud-native, declarative management, allowing you to define infrastructure as code using server profiles, network policies, and GPU policies. This ensures consistency, automates deployment, and simplifies Day-2 operations like firmware updates and scaling. UCS Manager is domain-centric and requires more manual configuration per domain, whereas Intersight provides a global, holistic view and management plane across multiple UCS domains.
Which storage solution is recommended for high-performance AI on UCS X-Series, FlexPod for AI or FlashStack for AI?+
Both are excellent, but their strengths differ. FlashStack for AI with Pure Storage FlashBlade //E excels in raw, low-latency, high-bandwidth file and object access, making it ideal for datasets used in large-scale model training. FlexPod for AI with NetApp AFF A900 provides a feature-rich, mature enterprise storage platform with advanced data management capabilities, suitable for organizations needing comprehensive data lifecycle and protection alongside AI. The best choice depends on specific performance needs, existing vendor relationships, and long-term data management strategy.