Making Deepspeed work on Lambda Labs GPU On-Demand Instance

Gundeep Singh
5 min readNov 2, 2024

--

1-click cluster visualization screenshot from lambdalabs website

Lambda labs on-demand instance come with Nvidia drivers and CUDA toolkit already installed on their linux machines. The setup however is different from a usual CUDA installation. So it’s possible that some libraries that expect things to be set up in usual paths can act differently. So I’m sharing some steps to debug and make things in such scenarios.

1. Understanding & Installing Nvidia Driver and CUDA Toolkit.

There is only one best Nvidia Driver for your system

When you have a specific printer model and a specific operating system, it is very easy to download printer drivers. All you have to do is select the OS and Printer Model on the vendor website and it will give you a list of drivers. You never download an old driver, always the latest driver. It’s similar with Nvidia drivers. In most cases, you simply wanna choose the latest Nvidia drivers compatible with your gpu model, linux os. Example GPU model and Linux OS:

GPU model: GA100 [A100 SXM4 80GB] (ubuntu-drivers devices | grep model)
Linux OS: Ubuntu 22.04 x86_64 (uname -vp)

You have one best option for driver installation (the most recent Nvidia proprietary drivers)

Multiple versions of Nvidia CUDA Toolkits can be compatible for your Nvidia driver

Depending on your application, you may want to install CUDA 12.4, 12.5, 12.6 etc. Each newer version have additional features, and some deprecated features. So each of the frameworks/libs like pytorch, transformers, deepspeed versions can depend on a specific CUDA versions.

First you wanna find latest version compatible with your tools/libs/frameworks. Then check if it’s already installed using nvidia-smi command. If it’s not installed, then install it using official guide.

2. Install common development libs

sudo apt update
sudo apt-get install build-essential python-dev python-setuptools python-pip python-smbus -y || sudo apt-get install build-essential python3 python-setuptools python-pip python3-smbus -y
sudo apt-get install libncursesw5-dev libgdbm-dev libc6-dev -y
sudo apt-get install zlib1g-dev libsqlite3-dev tk-dev -y
sudo apt-get install libssl-dev openssl -y
sudo apt-get install libffi-dev -y
sudo apt install libreadline-dev -y
sudo apt install -y libaio-dev # async io
sudo apt install -y python3-dev
sudo apt install libbz2-dev xz-utils libxml2-dev libxmlsec1-dev liblzma-dev -y

# NOTE: update python version in following if needed
sudo apt install -y libpython3.11-dev

3. Making deepspeed work

There are 3 steps for this part:

3. 1 Locate CUDA installation path.

You can use fd for it.

# apt install fd-find
ubuntu@lambdalabs:~ fd libnvvm.so | grep lib64
usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so

3.2 Export the CUDA_HOME and LD_LIBRARY_PATHS

echo 'export CUDA_HOME="/usr/lib/nvidia-cuda-toolkit"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib/nvidia-cuda-toolkit/lib64"' >> ~/.bashrc

3.3 Make sure all ops are compatible with JIT installation

What deepspeed config you are using will decide which deepspeed ops are required. For example a cpu offload will require cpu_adam ops.

Each deepspeed op is:

  • either a custom implementation using raw CUDA code.
  • or are built on top cuBLASS, cuDNN, CUTLASS etc (which are higher-level libraries built on top of CUDA that provide specialized functionality for specific domains like linear algebra, deep learning, etc.)

Installing deepspeed doesn’t install all the ops referenced to by your deepspeed config like this zero2 example:

{
"train_micro_batch_size_per_gpu": 16,
"gradient_accumulation_steps": 4,
"fp16": {
"enabled": true
},
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu"
},
"contiguous_gradients": true,
"overlap_comm": true
},
"zero_allow_untested_optimizer": true
}

Instead, Deepspeed install ops Just in Time using ninja. Because deepspeed handles installing ops automatically, we just have to make sure that the ops are compatible with our system. We can do this by running

ds_report

which prints output like this:

ubuntu@lambdalabs:~ ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
[WARNING] FP Quantizer is using an untested triton version (3.0.0), only 2.3.0 and 2.3.1 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ubuntu/.cache/pypoetry/virtualenvs/voiceai-nlp-llms-5N74WdyD-py3.11/lib/python3.11/site-packages/torch']
torch version .................... 2.4.1+cu124
deepspeed install path ........... ['/home/ubuntu/.cache/pypoetry/virtualenvs/voiceai-nlp-llms-5N74WdyD-py3.11/lib/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.14.5, unknown, unknown
torch cuda version ............... 12.4
torch hip version ................ None
nvcc version ..................... 12.4
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 885.84 GB

Make sure that the op you are planning to use in your deepspeed config is compatible in the report. If not, fix it by intalling the correct lib as suggested by the error in the report.

Following these steps should make sure that your deepspeed installation work as expected.

If there are errors while running deepspeed, it’s best to go to the first line of the error to figure out the root cause of the error and identify which of these steps need a change.

Final thoughts

It’s very tempting to reinstall Nvidia drivers and CUDA toolkit and do a fresh install, which often works too. However, in my opinion, it’s not a sustainable approach and can lead to continued lack of understanding of how these libraries, framworks, & tools work together. So when possible, it’s best to try to dissect the problems, and fix only what needs a fix. It can help you gain deeper understanding of your tech stack and save time in the long run.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Gundeep Singh
Gundeep Singh

Written by Gundeep Singh

Learner, Explorer, Developer, Deep Learning & LLM train. GOTTA CATCH EM ALL.

No responses yet

Write a response