Recent posts

Oh no 128GB is not enaught for 7B parameters!

TL/DR

If you’re working with large language models (LLMs) on systems like the DGX Spark, and encountering “out of memory” errors despite having seemingly ample RAM (e.g., 128GB for a 7B parameter model), the culprit might be your operating system’s caching mechanisms. The solution is often as simple as dropping system caches.

  • DGX Spark uses UMA (Unified Memory Architecture): CPU and GPU share the same memory.
  • OS Caching: The OS aggressively uses memory for caches, which might not be visible to GPU tools.
  • CUDA vs. Actual Usage: DGX Dashboard’s memory usage (via CUDA API) might show high usage even without a model loaded due to OS caches.
  • The Fix: Clear system caches with sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'.
  • It is mentioned in NVidia docs - check it here
More …

Unsloth your DGX Spark

TL/DR

The NVIDIA DGX Spark is a powerful little devbox for local model development, boasting 128GB of unified memory despite its compact size. To truly unleash its potential with tools like Unsloth, you need to navigate a few key challenges:

  • Official NVIDIA PyTorch Image is Key: Leverage NVIDIA’s optimized PyTorch Docker image for maximum performance on the B10 chip.
  • UV for Dependency Management: Use uv to create a virtual environment, allowing you to pin specific library versions while utilizing the optimized PyTorch from the base image.
  • Block PyTorch with UV: Prevent uv from reinstalling PyTorch by using override-dependencies in your pyproject.toml.
  • TORCH_CUDA_ARCH_LIST Override: Correctly set or unset TORCH_CUDA_ARCH_LIST to 12.0 for successful xformers compilation.
  • Custom xformers Build: Install xformers from a custom source branch that supports CUDA 12.1 until the official merge.
  • Upgrades: When upgrading base image - virtual environment needs to be recreated
  • Full repo with code: code is here
More …