Hardware Specifications

Detailed hardware specifications of the Prometheus cluster

Cluster Overview

The Prometheus cluster features a modern architecture optimized for deep learning workloads with high-performance GPUs, abundant memory, and fast storage systems.

Head Node

Management and login node for the cluster

Hardware Configuration

  • Chassis: GIGABYTE R182-Z90-00
  • Motherboard: GIGABYTE MZ92-FS0-00
  • CPU: 2× AMD EPYC 7313 (16 cores/32 threads each)
  • Total CPU Cores: 32 cores / 64 threads
  • RAM: 16× 32GB Samsung M393A4K40EB3-CWE
  • Total RAM: 512GB DDR4
  • Storage: 2× 1.92TB Intel SSDSC2KB019T8 SSD
  • File System: /trinity/home (400GB allocated)

Purpose

  • SSH login and job submission
  • File management and transfers
  • SLURM job scheduling
  • Not for compute workloads

Compute Nodes

GPU Nodes gpu[01-08] (8 nodes)

Primary compute nodes with NVIDIA A5000 GPUs

Hardware Configuration

  • Chassis: Supermicro AS-4124GS-TNR
  • Motherboard: Supermicro H12DSG-O-CPU
  • CPU: 2× AMD EPYC 7313 (16 cores/32 threads each)
  • Total CPU Cores: 32 cores / 64 threads per node
  • RAM: 16× 32GB SK Hynix HMAA4GR7AJR8N-XN
  • Total RAM: 512GB DDR4 per node
  • Local Storage: 1× 1TB Samsung SSD 980 NVMe

GPU Specifications

  • GPU Model: NVIDIA A5000
  • GPU Count: 8 GPUs per node (64 total across all nodes)
  • GPU Memory: 24GB GDDR6 per GPU
  • CUDA Cores: 8,192 per GPU
  • Tensor Cores: 256 RT Cores (2nd gen)
  • Peak Performance: 27.8 TFLOPS FP32 per GPU
  • Memory Bandwidth: 768 GB/s per GPU

Total Resources (gpu[01-08])

  • Total GPUs: 64× NVIDIA A5000
  • Total GPU Memory: 1,536GB (1.5TB)
  • Total CPU Cores: 256 cores / 512 threads
  • Total System RAM: 4TB DDR4

GPU Node gpu09 (1 node)

High-memory GPU node with NVIDIA A6000 Ada

Hardware Configuration

  • Chassis: ASUS RS720A-E11-RS12
  • Motherboard: ASUS KMPP-D32
  • CPU: 2× AMD EPYC 7313 (16 cores/32 threads each)
  • Total CPU Cores: 32 cores / 64 threads
  • RAM: 16× 32GB SK Hynix HMAA4GR7AJR8N-XN
  • Total RAM: 512GB DDR4
  • Local Storage: 1× 1TB Samsung SSD 980 NVMe

GPU Specifications

  • GPU Model: NVIDIA RTX A6000 Ada Generation
  • GPU Count: 4 GPUs
  • GPU Memory: 48GB GDDR6 per GPU
  • CUDA Cores: 18,176 per GPU
  • Tensor Cores: 568 (4th gen)
  • Peak Performance: 91.06 TFLOPS FP32 per GPU
  • Memory Bandwidth: 960 GB/s per GPU

Total Resources (gpu09)

  • Total GPUs: 4× NVIDIA A6000 Ada
  • Total GPU Memory: 192GB
  • Total CPU Cores: 32 cores / 64 threads
  • Total System RAM: 512GB DDR4

Storage Nodes

Storage Architecture

High-performance parallel file system

Hardware Configuration (2 storage nodes)

  • Chassis: Supermicro Super Server
  • Motherboard: Supermicro H12SSL-i
  • CPU: 1× AMD EPYC 7302P (16 cores/32 threads each)
  • RAM: 8× 16GB Samsung M393A2K40DB3-CWE
  • Total RAM: 256GB DDR4 per node
  • OS Storage: 2× 240GB Intel SSDSC2KB240G7 SSD
  • Data Storage: 24× 7.68TB Samsung MZILT7T6HALA/007 NVMe SSD

Storage Specifications

  • File System: Lustre parallel file system
  • Mount Point: /lustreFS
  • Raw Capacity: 368TB per storage node
  • Total Raw Capacity: 736TB across both nodes
  • Usable Capacity: ~305TB (after RAID and file system overhead)
  • Performance: High-throughput parallel I/O

Software Environment

Operating System

  • Distribution: Rocky Linux 8.5 (Green Obsidian)
  • Kernel Version: 4.18.0-348.23.1.el8_5.x86_64
  • Architecture: x86_64

Management Software

  • Job Scheduler: SLURM Workload Manager
  • Module System: Lmod (Lua-based Environment Modules)
  • File System: Lustre for parallel storage

Development Tools

  • CUDA Toolkit: 11.3+ with cuDNN
  • Compilers: GCC, Intel, NVCC
  • MPI: OpenMPI, MPICH
  • Python: Multiple versions with conda/pip
  • Deep Learning: PyTorch, TensorFlow, JAX
  • Containers: Singularity/Apptainer support

Network Architecture

Interconnect

  • Compute Network: High-speed Ethernet
  • Storage Network: Dedicated Lustre network
  • Management Network: Separate administrative network

Bandwidth

  • Node-to-Node: High-bandwidth for distributed training
  • Storage Access: Optimized for parallel I/O workloads
  • External Access: Internet connectivity for downloads

Performance Characteristics

Compute Performance

  • Total GPU Performance:
    • A5000 nodes: 1,779 TFLOPS FP32 (64 × 27.8)
    • A6000 node: 364 TFLOPS FP32 (4 × 91.06)
    • Combined: ~2,143 TFLOPS FP32
  • Memory Bandwidth:
    • A5000 total: 49,152 GB/s
    • A6000 total: 3,840 GB/s
    • Combined: ~53TB/s GPU memory bandwidth

Storage Performance

  • Lustre File System: High-throughput parallel I/O
  • Local NVMe: High IOPS for temporary data
  • Home Directories: SSD-backed for fast access

Resource Allocation

Per-Node Resources

  • CPU Cores: 32 physical / 64 logical per node
  • System Memory: 512GB DDR4 per node
  • GPU Memory:
    • A5000 nodes: 192GB per node (8 × 24GB)
    • A6000 node: 192GB (4 × 48GB)
  • Local Storage: 1TB NVMe SSD per compute node

Total Cluster Resources

  • Compute Nodes: 9 total
  • CPU Cores: 288 physical / 576 logical
  • System Memory: 4.5TB DDR4
  • GPUs: 68 total (64 A5000 + 4 A6000 Ada)
  • GPU Memory: 1.728TB total
  • Shared Storage: 305TB Lustre + local NVMe

Use Cases and Workloads

Optimized For

  • Large Language Models: High GPU memory for transformer models
  • Computer Vision: Parallel training on multiple GPUs
  • Distributed Training: Multi-node deep learning
  • High-throughput Computing: Batch processing workflows
  • Interactive Development: Jupyter notebooks and VS Code

Performance Considerations

  • Memory-bound workloads: Benefit from A6000’s 48GB VRAM
  • Compute-intensive tasks: Leverage A5000’s efficiency
  • Data-intensive jobs: Utilize high-performance Lustre storage
  • Multi-GPU training: Scale across nodes with SLURM
Last modified June 10, 2025: Update docs (3983260)