Making Deepspeed work on Lambda Labs GPU On-Demand Instance

5 min readNov 2, 2024

1-click cluster visualization screenshot from lambdalabs website

Lambda labs on-demand instance come with Nvidia drivers and CUDA toolkit already installed on their linux machines. The setup however is different from a usual CUDA installation. So it’s possible that some libraries that expect things to be set up in usual paths can act differently. So I’m sharing some steps to debug and make things in such scenarios.

1. Understanding & Installing Nvidia Driver and CUDA Toolkit.

There is only one best Nvidia Driver for your system

When you have a specific printer model and a specific operating system, it is very easy to download printer drivers. All you have to do is select the OS and Printer Model on the vendor website and it will give you a list of drivers. You never download an old driver, always the latest driver. It’s similar with Nvidia drivers. In most cases, you simply wanna choose the latest Nvidia drivers compatible with your gpu model, linux os. Example GPU model and Linux OS:

GPU model: GA100 [A100 SXM4 80GB] (ubuntu-drivers devices | grep model)
Linux OS: Ubuntu 22.04 x86_64 (uname -vp)

You have one best option for driver installation (the most recent Nvidia proprietary drivers)

Multiple versions of Nvidia CUDA Toolkits can be compatible for your Nvidia driver

Depending on your application, you may want to install CUDA 12.4, 12.5, 12.6 etc. Each newer version have additional features, and some deprecated features. So each of the frameworks/libs like pytorch, transformers, deepspeed versions can depend on a specific CUDA versions.

First you wanna find latest version compatible with your tools/libs/frameworks. Then check if it’s already installed using nvidia-smi command. If it’s not installed, then install it using official guide.

2. Install common development libs

sudo apt update
sudo apt-get install build-essential python-dev python-setuptools python-pip python-smbus -y || sudo apt-get install build-essential python3 python-setuptools python-pip python3-smbus -y
sudo apt-get install libncursesw5-dev libgdbm-dev libc6-dev -y
sudo apt-get install zlib1g-dev libsqlite3-dev tk-dev -y
sudo apt-get install libssl-dev openssl -y
sudo apt-get install libffi-dev -y
sudo apt install libreadline-dev -y
sudo apt install -y libaio-dev # async io
sudo apt install -y python3-dev
sudo apt install libbz2-dev xz-utils libxml2-dev libxmlsec1-dev liblzma-dev -y

# NOTE: update python version in following if needed
sudo apt install -y libpython3.11-dev

3. Making deepspeed work

There are 3 steps for this part:

3. 1 Locate CUDA installation path.

You can use fd for it.

# apt install fd-find
ubuntu@lambdalabs:~ fd libnvvm.so | grep lib64
usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so

3.2 Export the CUDA_HOME and LD_LIBRARY_PATHS

echo 'export CUDA_HOME="/usr/lib/nvidia-cuda-toolkit"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib/nvidia-cuda-toolkit/lib64"' >> ~/.bashrc

3.3 Make sure all ops are compatible with JIT installation

What deepspeed config you are using will decide which deepspeed ops are required. For example a cpu offload will require cpu_adam ops.

Each deepspeed op is:

either a custom implementation using raw CUDA code.
or are built on top cuBLASS, cuDNN, CUTLASS etc (which are higher-level libraries built on top of CUDA that provide specialized functionality for specific domains like linear algebra, deep learning, etc.)

Installing deepspeed doesn’t install all the ops referenced to by your deepspeed config like this zero2 example:

{
    "train_micro_batch_size_per_gpu": 16,
    "gradient_accumulation_steps": 4,
    "fp16": {
        "enabled": true
    },
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "cpu"
        },
        "contiguous_gradients": true,
        "overlap_comm": true
    },
    "zero_allow_untested_optimizer": true
}

Instead, Deepspeed install ops Just in Time using ninja. Because deepspeed handles installing ops automatically, we just have to make sure that the ops are compatible with our system. We can do this by running

ds_report

which prints output like this:

ubuntu@lambdalabs:~ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
 [WARNING]  FP Quantizer is using an untested triton version (3.0.0), only 2.3.0 and 2.3.1 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
 [WARNING]  using untested triton version (3.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ubuntu/.cache/pypoetry/virtualenvs/voiceai-nlp-llms-5N74WdyD-py3.11/lib/python3.11/site-packages/torch']
torch version .................... 2.4.1+cu124
deepspeed install path ........... ['/home/ubuntu/.cache/pypoetry/virtualenvs/voiceai-nlp-llms-5N74WdyD-py3.11/lib/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.14.5, unknown, unknown
torch cuda version ............... 12.4
torch hip version ................ None
nvcc version ..................... 12.4
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 885.84 GB

Make sure that the op you are planning to use in your deepspeed config is compatible in the report. If not, fix it by intalling the correct lib as suggested by the error in the report.

Following these steps should make sure that your deepspeed installation work as expected.

If there are errors while running deepspeed, it’s best to go to the first line of the error to figure out the root cause of the error and identify which of these steps need a change.

Final thoughts

It’s very tempting to reinstall Nvidia drivers and CUDA toolkit and do a fresh install, which often works too. However, in my opinion, it’s not a sustainable approach and can lead to continued lack of understanding of how these libraries, framworks, & tools work together. So when possible, it’s best to try to dissect the problems, and fix only what needs a fix. It can help you gain deeper understanding of your tech stack and save time in the long run.

Making Deepspeed work on Lambda Labs GPU On-Demand Instance

1. Understanding & Installing Nvidia Driver and CUDA Toolkit.

There is only one best Nvidia Driver for your system

Multiple versions of Nvidia CUDA Toolkits can be compatible for your Nvidia driver

2. Install common development libs

3. Making deepspeed work

3. 1 Locate CUDA installation path.

3.2 Export the CUDA_HOME and LD_LIBRARY_PATHS

3.3 Make sure all ops are compatible with JIT installation

Final thoughts

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Gundeep Singh

No responses yet