Environment Modules
Categories:
Overview
The Prometheus cluster uses Lmod (Lua-based Environment Modules) to manage software packages and their dependencies. This system allows you to easily load and unload different software versions without conflicts.
Module System Basics
Environment modules modify your shell environment to provide access to specific software packages. When you load a module, it typically:
- Adds software to your
PATH - Sets environment variables
- Loads required dependencies
- Configures library paths
Basic Module Commands
List Available Modules
# Show all available modules
module available
module avail
module av
ml av
# Search for specific modules
module avail gcc
module avail python
module avail cuda
Load Modules
# Load a specific module
module load GCC/10.3.0
module load CUDA/11.3.1
module load Python/3.9.5
# Short form using 'ml'
ml GCC/10.3.0 CUDA/11.3.1 Python/3.9.5
Check Loaded Modules
# List currently loaded modules
module list
ml list
Unload Modules
# Unload a specific module
module unload GCC/10.3.0
# Unload all modules
module purge
ml purge
Common Software Modules
Compilers
# GNU Compiler Collection
module load GCC/10.3.0
module load GCC/11.2.0
# Intel Compilers (if available)
module load intel/2021.4.0
CUDA and GPU Development
# CUDA Toolkit
module load CUDA/11.3.1
module load CUDA/11.7.0
module load CUDA/12.0.0
# Check CUDA after loading
nvcc --version
nvidia-smi
Python Environments
# Python interpreter
module load Python/3.9.5
module load Python/3.10.8
# Python with scientific libraries
module load Python/3.9.5-GCCcore-10.3.0
Deep Learning Frameworks
# PyTorch (if pre-installed as module)
module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0
# TensorFlow (if pre-installed as module)
module load TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0
Development Tools
# Git (usually available by default)
module load git/2.36.0
# CMake
module load CMake/3.24.3
# HDF5 for data storage
module load HDF5/1.12.2
Module Dependencies
Lmod automatically handles dependencies. When you load a module, it loads required dependencies:
# Loading Python might automatically load GCC
module load Python/3.9.5
# Check what was loaded
module list
# Might show:
# GCCcore/10.3.0
# Python/3.9.5-GCCcore-10.3.0
Setting Up Your Environment
Create a Module Loading Script
Create ~/load_modules.sh for consistent environment setup:
#!/bin/bash
# ~/load_modules.sh - Load standard development environment
# Clear any existing modules
module purge
# Load core development tools
module load GCC/10.3.0
module load CUDA/11.3.1
module load Python/3.9.5
# Optional: Load additional tools
# module load git/2.36.0
# module load CMake/3.24.3
echo "Development environment loaded:"
module list
Make it executable and use it:
chmod +x ~/load_modules.sh
source ~/load_modules.sh
Add to Your Shell Configuration
Add common modules to your ~/.bashrc:
# Add to ~/.bashrc
# Load standard modules at login
if [ -f ~/load_modules.sh ]; then
source ~/load_modules.sh
fi
SLURM Job Scripts with Modules
Basic Job with Modules
#!/bin/bash
#SBATCH -J module_job
#SBATCH --partition=defq
#SBATCH --qos=normal
#SBATCH --gres=gpu:1
#SBATCH --time=2:00:00
# Load required modules
module purge
module load CUDA/11.3.1
module load Python/3.9.5
# Verify modules are loaded
echo "Loaded modules:"
module list
# Check CUDA availability
echo "CUDA version:"
nvcc --version
# Activate your conda environment
source ~/anaconda3/bin/activate
conda activate myenv
# Run your script
python train.py
Multiple GPU Job with Modules
#!/bin/bash
#SBATCH -J multi_gpu_training
#SBATCH --partition=defq
#SBATCH --qos=long
#SBATCH --gres=gpu:4
#SBATCH --time=1-00:00:00
# Load modules for CUDA development
module load GCC/10.3.0
module load CUDA/11.3.1
module load Python/3.9.5
# Load MPI for distributed computing (if available)
# module load OpenMPI/4.1.4-GCC-10.3.0
# Set CUDA environment
export CUDA_VISIBLE_DEVICES=0,1,2,3
# Activate environment
conda activate pytorch-gpu
# Run distributed training
python -m torch.distributed.launch \
--nproc_per_node=4 \
train_distributed.py
Python Package Management
Using Conda with Modules
# Load Python module
module load Python/3.9.5
# Install conda (if not already available)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
# Add conda to path
echo 'export PATH="$HOME/miniconda3/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
# Create environment
conda create -n myenv python=3.9
conda activate myenv
# Install packages
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
Using pip with Modules
# Load Python module
module load Python/3.9.5
# Create virtual environment
python -m venv ~/venvs/myproject
source ~/venvs/myproject/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install packages
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install jupyter numpy pandas matplotlib
Module Collections
Save frequently used module combinations:
# Save current modules as a collection
module save my_collection
# List saved collections
module savelist
# Restore a collection
module restore my_collection
Custom Module Paths
If your group has custom modules:
# Add custom module path
module use /lustreFS/data/mygroup/modules
# Check module paths
module show MODULEPATH
Troubleshooting Modules
Common Issues
Module not found:
# Check available modules
module avail | grep -i package_name
# Check if you have access to the module path
ls -la /opt/modules/
Conflicting modules:
# Clear all modules and start fresh
module purge
module load GCC/10.3.0 CUDA/11.3.1
CUDA not found after loading:
# Verify CUDA module is loaded
module list | grep -i cuda
# Check CUDA environment
echo $CUDA_HOME
echo $CUDA_PATH
which nvcc
Python packages not found:
# Ensure Python module is loaded before using pip/conda
module load Python/3.9.5
which python
python --version
Module Information
# Show detailed module information
module show CUDA/11.3.1
module help CUDA/11.3.1
# See what a module does before loading
module display CUDA/11.3.1
Best Practices
For Interactive Development
- Create a standard environment script
- Use module collections for frequently used combinations
- Load modules before activating conda/venv
For SLURM Jobs
- Always start with
module purge - Load modules explicitly in job scripts
- Verify modules are loaded with
module list - Document module requirements in your scripts
For Reproducibility
Pin module versions in scripts:
module load CUDA/11.3.1 # Not just 'CUDA'Document module requirements:
# Required modules: # - GCC/10.3.0 # - CUDA/11.3.1 # - Python/3.9.5Use environment files for complex setups
Example Workflows
Deep Learning Setup
# Standard deep learning environment
module purge
module load GCC/10.3.0
module load CUDA/11.3.1
module load Python/3.9.5
# Activate conda environment
conda activate pytorch-env
# Verify setup
python -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}')"
Development Environment
# Development tools
module purge
module load GCC/10.3.0
module load CUDA/11.3.1
module load Python/3.9.5
module load git/2.36.0
module load CMake/3.24.3
# Save as collection
module save development
# Later, restore quickly
module restore development
Compilation Environment
# For compiling CUDA code
module purge
module load GCC/10.3.0
module load CUDA/11.3.1
# Compile CUDA program
nvcc -o program program.cu
# For C++ with GPU support
g++ -I$CUDA_HOME/include -L$CUDA_HOME/lib64 -lcudart program.cpp -o program
Next Steps
- Configure VS Code for remote development: VS Code Setup
- Learn about software installation: Software Installation
- Submit your first job: Job Submission