Hardware Specifications

Detailed hardware specifications of the Prometheus cluster

Cluster Overview

The Prometheus cluster features a modern architecture optimized for deep learning workloads with high-performance GPUs, abundant memory, and fast storage systems.

Head Node

Management and login node for the cluster

Hardware Configuration

Chassis: GIGABYTE R182-Z90-00
Motherboard: GIGABYTE MZ92-FS0-00
CPU: 2× AMD EPYC 7313 (16 cores/32 threads each)
Total CPU Cores: 32 cores / 64 threads
RAM: 16× 32GB Samsung M393A4K40EB3-CWE
Total RAM: 512GB DDR4
Storage: 2× 1.92TB Intel SSDSC2KB019T8 SSD
File System: /trinity/home (400GB allocated)

Purpose

SSH login and job submission
File management and transfers
SLURM job scheduling
Not for compute workloads

Compute Nodes

GPU Nodes `gpu[01-08]` (8 nodes)

Primary compute nodes with NVIDIA A5000 GPUs

Hardware Configuration

Chassis: Supermicro AS-4124GS-TNR
Motherboard: Supermicro H12DSG-O-CPU
CPU: 2× AMD EPYC 7313 (16 cores/32 threads each)
Total CPU Cores: 32 cores / 64 threads per node
RAM: 16× 32GB SK Hynix HMAA4GR7AJR8N-XN
Total RAM: 512GB DDR4 per node
Local Storage: 1× 1TB Samsung SSD 980 NVMe

GPU Specifications

GPU Model: NVIDIA A5000
GPU Count: 8 GPUs per node (64 total across all nodes)
GPU Memory: 24GB GDDR6 per GPU
CUDA Cores: 8,192 per GPU
Tensor Cores: 256 RT Cores (2nd gen)
Peak Performance: 27.8 TFLOPS FP32 per GPU
Memory Bandwidth: 768 GB/s per GPU

Total Resources (gpu[01-08])

Total GPUs: 64× NVIDIA A5000
Total GPU Memory: 1,536GB (1.5TB)
Total CPU Cores: 256 cores / 512 threads
Total System RAM: 4TB DDR4

GPU Node `gpu09` (1 node)

High-memory GPU node with NVIDIA A6000 Ada

Hardware Configuration

Chassis: ASUS RS720A-E11-RS12
Motherboard: ASUS KMPP-D32
CPU: 2× AMD EPYC 7313 (16 cores/32 threads each)
Total CPU Cores: 32 cores / 64 threads
RAM: 16× 32GB SK Hynix HMAA4GR7AJR8N-XN
Total RAM: 512GB DDR4
Local Storage: 1× 1TB Samsung SSD 980 NVMe

GPU Specifications

GPU Model: NVIDIA RTX A6000 Ada Generation
GPU Count: 4 GPUs
GPU Memory: 48GB GDDR6 per GPU
CUDA Cores: 18,176 per GPU
Tensor Cores: 568 (4th gen)
Peak Performance: 91.06 TFLOPS FP32 per GPU
Memory Bandwidth: 960 GB/s per GPU

Total Resources (gpu09)

Total GPUs: 4× NVIDIA A6000 Ada
Total GPU Memory: 192GB
Total CPU Cores: 32 cores / 64 threads
Total System RAM: 512GB DDR4

Storage Nodes

Storage Architecture

High-performance parallel file system

Hardware Configuration (2 storage nodes)

Chassis: Supermicro Super Server
Motherboard: Supermicro H12SSL-i
CPU: 1× AMD EPYC 7302P (16 cores/32 threads each)
RAM: 8× 16GB Samsung M393A2K40DB3-CWE
Total RAM: 256GB DDR4 per node
OS Storage: 2× 240GB Intel SSDSC2KB240G7 SSD
Data Storage: 24× 7.68TB Samsung MZILT7T6HALA/007 NVMe SSD

Storage Specifications

File System: Lustre parallel file system
Mount Point: /lustreFS
Raw Capacity: 368TB per storage node
Total Raw Capacity: 736TB across both nodes
Usable Capacity: ~305TB (after RAID and file system overhead)
Performance: High-throughput parallel I/O

Software Environment

Operating System

Distribution: Rocky Linux 8.5 (Green Obsidian)
Kernel Version: 4.18.0-348.23.1.el8_5.x86_64
Architecture: x86_64

Management Software

Job Scheduler: SLURM Workload Manager
Module System: Lmod (Lua-based Environment Modules)
File System: Lustre for parallel storage

Development Tools

CUDA Toolkit: 11.3+ with cuDNN
Compilers: GCC, Intel, NVCC
MPI: OpenMPI, MPICH
Python: Multiple versions with conda/pip
Deep Learning: PyTorch, TensorFlow, JAX
Containers: Singularity/Apptainer support

Network Architecture

Interconnect

Compute Network: High-speed Ethernet
Storage Network: Dedicated Lustre network
Management Network: Separate administrative network

Bandwidth

Node-to-Node: High-bandwidth for distributed training
Storage Access: Optimized for parallel I/O workloads
External Access: Internet connectivity for downloads

Performance Characteristics

Compute Performance

Total GPU Performance:
- A5000 nodes: 1,779 TFLOPS FP32 (64 × 27.8)
- A6000 node: 364 TFLOPS FP32 (4 × 91.06)
- Combined: ~2,143 TFLOPS FP32
Memory Bandwidth:
- A5000 total: 49,152 GB/s
- A6000 total: 3,840 GB/s
- Combined: ~53TB/s GPU memory bandwidth

Storage Performance

Lustre File System: High-throughput parallel I/O
Local NVMe: High IOPS for temporary data
Home Directories: SSD-backed for fast access

Resource Allocation

Per-Node Resources

CPU Cores: 32 physical / 64 logical per node
System Memory: 512GB DDR4 per node
GPU Memory:
- A5000 nodes: 192GB per node (8 × 24GB)
- A6000 node: 192GB (4 × 48GB)
Local Storage: 1TB NVMe SSD per compute node

Total Cluster Resources

Compute Nodes: 9 total
CPU Cores: 288 physical / 576 logical
System Memory: 4.5TB DDR4
GPUs: 68 total (64 A5000 + 4 A6000 Ada)
GPU Memory: 1.728TB total
Shared Storage: 305TB Lustre + local NVMe

Use Cases and Workloads

Optimized For

Large Language Models: High GPU memory for transformer models
Computer Vision: Parallel training on multiple GPUs
Distributed Training: Multi-node deep learning
High-throughput Computing: Batch processing workflows
Interactive Development: Jupyter notebooks and VS Code

Performance Considerations

Memory-bound workloads: Benefit from A6000’s 48GB VRAM
Compute-intensive tasks: Leverage A5000’s efficiency
Data-intensive jobs: Utilize high-performance Lustre storage
Multi-GPU training: Scale across nodes with SLURM

Last modified June 10, 2025: Update docs (3983260)

Hardware Specifications

Tags:

Categories:

Cluster Overview

Head Node

Hardware Configuration

Purpose

Compute Nodes

GPU Nodes `gpu[01-08]` (8 nodes)

Hardware Configuration

GPU Specifications

Total Resources (gpu[01-08])

GPU Node `gpu09` (1 node)

Hardware Configuration

GPU Specifications

Total Resources (gpu09)

Storage Nodes

Storage Architecture

Hardware Configuration (2 storage nodes)

Storage Specifications

Software Environment

Operating System

Management Software

Development Tools

Network Architecture

Interconnect

Bandwidth

Performance Characteristics

Compute Performance

Storage Performance

Resource Allocation

Per-Node Resources

Total Cluster Resources

Use Cases and Workloads

Optimized For

Performance Considerations

Hardware Specifications

Cluster Overview

Head Node

Hardware Configuration

Purpose

Compute Nodes

GPU Nodes gpu[01-08] (8 nodes)

Hardware Configuration

GPU Specifications

Total Resources (gpu[01-08])

GPU Node gpu09 (1 node)

Hardware Configuration

GPU Specifications

Total Resources (gpu09)

Storage Nodes

Storage Architecture

Hardware Configuration (2 storage nodes)

Storage Specifications

Software Environment

Operating System

Management Software

Development Tools

Network Architecture

Interconnect

Bandwidth

Performance Characteristics

Compute Performance

Storage Performance

Resource Allocation

Per-Node Resources

Total Cluster Resources

Use Cases and Workloads

Optimized For

Performance Considerations

GPU Nodes `gpu[01-08]` (8 nodes)

GPU Node `gpu09` (1 node)