pycunls Installation#

Prerequisites#

CUDA Toolkit (with nvcc, cudart, cuBLAS, cuSPARSE, cuSOLVER)
CMake >= 3.24
C++17 compiler
Python >= 3.10
NVIDIA GPU driver compatible with your CUDA Toolkit

Install from source (pip)#

From the python/ directory inside the repository:

cd python
pip install .

This uses scikit-build-core to configure CMake with -DBUILD_PYTHON_BINDINGS=ON -DBUILD_SHARED_LIBS=OFF, compiles the _pycunls_core native extension via nanobind, statically links the cuNLS core library, and installs the pycunls package into your Python environment.

To include the optional NVIDIA Warp integration and test dependencies:

pip install ".[all]"

Or selectively:

pip install ".[warp]"   # Warp support only
pip install ".[test]"   # test dependencies (pytest, cupy, warp-lang)

Build a wheel#

cd python
pip wheel . -w dist/

The resulting .whl file in dist/ can be installed on any machine with a compatible CUDA Toolkit and GPU driver.

Dependencies#

Required (installed automatically by pip):

cupy-cuda12x — GPU arrays and CUDA interop. All pycunls constructors accept either a cupy.ndarray or a raw int device pointer.

Optional:

warp-lang — NVIDIA Warp for authoring custom factor and state kernels in Python (see Python Tutorial). Install with pip install warp-lang or pip install "pycunls[warp]".

Build-time only:

scikit-build-core — CMake build backend for Python packaging.
nanobind — lightweight C++/Python binding library (fetched by CMake during build).

Verify the installation#

python -c "import pycunls; print(pycunls.__version__)"

You should see the installed version string (e.g. 0.1.0).