Recent posts
31 Oct 2025
TL/DR
If you’re working with large language models (LLMs) on systems like the DGX Spark, and encountering “out of memory” errors despite having seemingly ample RAM (e.g., 128GB for a 7B parameter model), the culprit might be your operating system’s caching mechanisms. The solution is often as simple as dropping system caches.
- DGX Spark uses UMA (Unified Memory Architecture): CPU and GPU share the same memory.
- OS Caching: The OS aggressively uses memory for caches, which might not be visible to GPU tools.
- CUDA vs. Actual Usage: DGX Dashboard’s memory usage (via CUDA API) might show high usage even without a model loaded due to OS caches.
- The Fix: Clear system caches with
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'.
- It is mentioned in NVidia docs - check it here
More …
28 Oct 2025
TL/DR
The NVIDIA DGX Spark is a powerful little devbox for local model development, boasting 128GB of unified memory despite its compact size. To truly unleash its potential with tools like Unsloth, you need to navigate a few key challenges:
- Official NVIDIA PyTorch Image is Key: Leverage NVIDIA’s optimized PyTorch Docker image for maximum performance on the B10 chip.
- UV for Dependency Management: Use
uv to create a virtual environment, allowing you to pin specific library versions while utilizing the optimized PyTorch from the base image.
- Block PyTorch with UV: Prevent
uv from reinstalling PyTorch by using override-dependencies in your pyproject.toml.
TORCH_CUDA_ARCH_LIST Override: Correctly set or unset TORCH_CUDA_ARCH_LIST to 12.0 for successful xformers compilation.
- Custom
xformers Build: Install xformers from a custom source branch that supports CUDA 12.1 until the official merge.
- Upgrades: When upgrading base image - virtual environment needs to be recreated
- Full repo with code: code is here
More …