Software Installation

Installing and managing software packages on the Prometheus cluster

Overview

The Prometheus cluster provides several methods for installing and managing software packages. This guide covers both system-wide modules and user-specific installations.

Installation Methods

1. Environment Modules (Recommended)

Use pre-installed software via the module system when available:

module avail python
module load Python/3.9.5

2. Conda/Mamba Package Manager

Install packages in isolated environments:

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

3. pip Package Manager

Install Python packages via pip:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

4. Source Installation

Compile software from source when needed:

git clone https://github.com/project/repo.git
cd repo && python setup.py install

Setting Up Python Environments

Conda Installation

If conda is not available, install Miniconda:

# Download Miniconda
cd /lustreFS/data/mygroup/$USER
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Install Miniconda
bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3

# Initialize conda
~/miniconda3/bin/conda init bash
source ~/.bashrc

Create Virtual Environments

# Create a new environment
conda create -n pytorch-env python=3.9

# Activate environment
conda activate pytorch-env

# Install packages
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install jupyter matplotlib pandas scikit-learn

Environment Management

# List environments
conda env list

# Export environment
conda env export > environment.yml

# Create from file
conda env create -f environment.yml

# Remove environment
conda env remove -n old-env

Deep Learning Frameworks

PyTorch Installation

# Create PyTorch environment
conda create -n pytorch python=3.9
conda activate pytorch

# Install PyTorch with CUDA support
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

# Verify installation
python -c "import torch; print(f'PyTorch {torch.__version__}, CUDA: {torch.cuda.is_available()}')"

TensorFlow Installation

# Create TensorFlow environment
conda create -n tensorflow python=3.9
conda activate tensorflow

# Install TensorFlow
pip install tensorflow[and-cuda]

# Verify GPU support
python -c "import tensorflow as tf; print(f'TensorFlow {tf.__version__}, GPUs: {len(tf.config.list_physical_devices("GPU"))}')"

JAX Installation

# Create JAX environment
conda create -n jax python=3.9
conda activate jax

# Install JAX with CUDA support
pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# Verify installation
python -c "import jax; print(f'JAX devices: {jax.devices()}')"

Specialized Libraries

MinkowskiEngine

MinkowskiEngine is an auto-differentiation library for sparse tensors, particularly useful for 3D computer vision tasks.

Installation Steps

Create dedicated environment:

conda create -n py3-mink python=3.8
conda activate py3-mink

Install dependencies:

conda install openblas-devel -c anaconda
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge

Load required modules:
```
module load CUDA/11.3.1 gnu9
```

Submit interactive job for compilation:

srun -n 1 -c 4 --gres=gpu:1 --mem=20000 --pty /bin/bash

Install MinkowskiEngine:

conda activate py3-mink
pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps \
    --install-option="--blas_include_dirs=${CONDA_PREFIX}/include" \
    --install-option="--blas=openblas"

Usage Example

import torch
import MinkowskiEngine as ME

# Create sparse tensor
coords = torch.IntTensor([[0, 1], [0, 1], [0, 2], [1, 0], [1, 2]])
feats = torch.FloatTensor([[1], [2], [3], [4], [5]])

# Create sparse tensor
sparse_tensor = ME.SparseTensor(feats, coords)
print(f"Sparse tensor shape: {sparse_tensor.shape}")

PointGPT

PointGPT extends GPT concepts to point clouds for 3D understanding tasks.

Installation Steps

Create environment:

conda create -n pointgpt python=3.8
conda activate pointgpt

Install PyTorch and dependencies:

conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 tensorboard -c pytorch -c conda-forge
pip install easydict h5py matplotlib open3d opencv-python pyyaml timm tqdm transforms3d termcolor scipy ninja plyfile numpy==1.23.4
pip install setuptools==59.5.0

Load CUDA module:
```
module load CUDA/11.3.1
```

Clone PointGPT repository:

cd /lustreFS/data/mygroup/$USER
git clone https://github.com/CGuangyan-BIT/PointGPT.git
cd PointGPT

Submit interactive job for compilation:

srun -n 1 -c 4 --gres=gpu:1 --mem=20000 --pty /bin/bash

Install extensions:

conda activate pointgpt

# Chamfer Distance & EMD
cd ./extensions/chamfer_dist
python setup.py install --user
cd ../emd
python setup.py install --user
cd ../

# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

# GPU kNN
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl

Computer Vision Libraries

OpenCV Installation

conda activate myenv
conda install opencv -c conda-forge

# Or install from pip
pip install opencv-python opencv-contrib-python

Open3D for 3D Processing

conda activate myenv
pip install open3d

# Test installation
python -c "import open3d as o3d; print(f'Open3D {o3d.__version__}')"

PIL/Pillow for Image Processing

conda install pillow
# or
pip install Pillow

Scientific Computing

NumPy, SciPy, Pandas

conda install numpy scipy pandas matplotlib seaborn
# or
pip install numpy scipy pandas matplotlib seaborn

Jupyter and IPython

conda install jupyter ipython ipykernel
# or
pip install jupyter ipython ipykernel

# Add environment to Jupyter
python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

Scikit-learn

conda install scikit-learn
# or
pip install scikit-learn

Development Tools

Git and Version Control

# Git is usually available by default
git --version

# Configure Git
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

Build Tools

# Install build essentials
conda install cmake make ninja

# For C++ development
conda install gxx_linux-64 gcc_linux-64

Debugging Tools

# Install debugging tools
pip install pdb++ ipdb

# Memory profiling
pip install memory_profiler

# Line profiling
pip install line_profiler

Installation in SLURM Jobs

Interactive Installation

# Submit interactive job for installation
srun --partition=defq --qos=normal --gres=gpu:1 --mem=16000 --time=2:00:00 --pty /bin/bash

# Load modules
module load CUDA/11.3.1 Python/3.9.5

# Activate environment
conda activate myenv

# Install packages
pip install package-name

Batch Installation Script

#!/bin/bash
#SBATCH -J install_packages
#SBATCH --partition=defq
#SBATCH --qos=normal
#SBATCH --cpus-per-task=4
#SBATCH --mem=8000
#SBATCH --time=1:00:00

# Load modules
module load Python/3.9.5

# Activate environment
conda activate myenv

# Install packages
pip install -r requirements.txt

echo "Installation completed"

Package Management Best Practices

Requirements Files

Create requirements.txt for reproducibility:

torch==1.12.1+cu117
torchvision==0.13.1+cu117
torchaudio==0.12.1+cu117
numpy==1.23.4
pandas==1.5.2
matplotlib==3.6.2
jupyter==1.0.0

Install from requirements:

pip install -r requirements.txt

Environment Files

Create environment.yml for conda:

name: myproject
channels:
  - pytorch
  - nvidia
  - conda-forge
dependencies:
  - python=3.9
  - pytorch=1.12.1
  - torchvision=0.13.1
  - torchaudio=0.12.1
  - pytorch-cuda=11.7
  - numpy
  - pandas
  - matplotlib
  - jupyter
  - pip
  - pip:
    - some-pip-package

Create environment:

conda env create -f environment.yml

Storage Considerations

Install packages in shared group storage to avoid quota issues:

# Set conda environments path
echo "envs_dirs:
  - /lustreFS/data/mygroup/conda/envs" > ~/.condarc

# Set pip cache directory
export PIP_CACHE_DIR=/lustreFS/data/mygroup/pip-cache
echo 'export PIP_CACHE_DIR=/lustreFS/data/mygroup/pip-cache' >> ~/.bashrc

Troubleshooting

Common Installation Issues

CUDA compatibility errors:

# Check CUDA version
nvidia-smi

# Install matching PyTorch version
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

Memory errors during installation:

# Request more memory for installation
srun --mem=32000 --pty /bin/bash

# Or increase pip timeout
pip install --timeout 1000 package-name

Permission errors:

# Install in user space
pip install --user package-name

# Or check conda environment ownership
ls -la ~/miniconda3/envs/

Network timeouts:

# Use conda-forge channel
conda install -c conda-forge package-name

# Or use pip with retries
pip install --retries 10 package-name

Compilation Issues

Missing compilers:

# Load compiler modules
module load GCC/10.3.0

# Check compiler availability
gcc --version
nvcc --version

Missing headers:

# Install development packages
conda install gxx_linux-64 gcc_linux-64

# For CUDA development
module load CUDA/11.3.1
echo $CUDA_HOME

Environment Conflicts

Package conflicts:

# Create fresh environment
conda create -n clean-env python=3.9
conda activate clean-env

# Install packages one by one
conda install pytorch -c pytorch

Module vs conda conflicts:

# Always load modules before activating conda
module load Python/3.9.5
conda activate myenv

Package Documentation

Keep track of installed packages:

# List conda packages
conda list > conda_packages.txt

# List pip packages
pip freeze > pip_requirements.txt

# Environment information
conda info --envs > environments.txt

Next Steps

Learn job submission: Job Submission
Explore GPU programming: GPU Computing
Set up monitoring: Performance Monitoring

Last modified June 10, 2025: Update docs (3983260)