State API#

The state module provides batched storage and manifold updates for optimization variables. State batches implement the Plus (retraction) operation so the solver can update states in tangent space while keeping them on the manifold.

C++ — cunls/state: Python — pycunls

Manifolds#

What is a manifold?

Many variables in nonlinear least squares do not live in \(\mathbb{R}^n\) but on curved spaces: 2D/3D rotations (SO(2), SO(3)), rigid or similarity transforms (SE(2), SE(3), Sim(2), Sim(3)), projective linear groups (SL(4)), or other constrained sets. Such a space is a manifold: at each point \(x\) there is a tangent space (a linear space of “directions”) whose dimension is the intrinsic dimension of the manifold. The ambient space is the larger Euclidean space in which the manifold is embedded (e.g. 3×3 matrices for SO(3), so ambient dimension 9).

Why use manifolds?

Constraint satisfaction: Updates are applied in the tangent space and then mapped back onto the manifold, so the state never leaves the constraint set (e.g. rotation matrices stay orthogonal).
Correct dimension: The solver only works with as many unknowns as the tangent dimension (e.g. 3 for SO(3) instead of 9), which improves numerics and efficiency.

Plus (retraction)

The Plus operation (in the literature often written \(\boxplus\)) takes a point \(x\) on the manifold and a tangent vector \(\Delta\) and returns a new point on the manifold:

\[x \oplus \Delta = \mathrm{Plus}(x,\, \Delta)\]

So the solver computes an update \(\Delta\) in tangent space (e.g. from Gauss-Newton or Levenberg-Marquardt) and then sets \(x_{\mathrm{new}} = x \oplus \Delta\). For Euclidean space, \(x \oplus \Delta = x + \Delta\). For Lie groups (SO, SE, Sim), Plus is implemented as right-multiplication by the exponential of the Lie algebra element: \(x \oplus \Delta = x \cdot \mathrm{Exp}(\Delta)\).

How the minimizer uses state batches

The minimizer holds a current state \(x\) in ambient storage. It solves for a tangent update \(\Delta\) (using Jacobians that are w.r.t. tangent space). Then it calls StateBatch::Plus() (or StateBatchOps::Plus over multiple batches) to write \(x \oplus \Delta\) back into the state buffer. So the state batch is the object that knows how to apply \(\oplus\) for its manifold.

StateBatch Interface#

size_t TangentSize() const#

Returns:: [out] Tangent-space dimension per state block.

size_t AmbientSize() const#

Returns:: [out] Ambient/storage dimension per state block.

size_t NumStateBlocks() const#

Returns:: [out] Number of state blocks in this batch.

void Plus( const float *x, const float *delta, float *x_plus_delta, cudaStream_t stream )#

Computes \(x_{\mathrm{out}} = x \oplus \delta\) for each block in the batch.

Parameters:

x – [in] Device pointer to the current state values (ambient).
delta – [in] Device pointer to tangent-space updates.
x_plus_delta – [out] Device pointer to updated state values (ambient).
stream – [in] CUDA stream for asynchronous execution.

Returns:

[out] No return value.

float *StateBlockDevicePtr(size_t state_block_idx)#

Parameters:: state_block_idx – [in] Zero-based index of state block.
Returns:: [out] Mutable device pointer for the selected block, or nullptr when out-of-range.

const float *StateBlockDevicePtr(size_t state_block_idx) const#

Parameters:: state_block_idx – [in] Zero-based index of state block.
Returns:: [out] Const device pointer for the selected block, or nullptr when out-of-range.

const int *ConstStateIds() const#

Returns:: [out] Device pointer to constant-state indices, or nullptr when none are set.

size_t NumConstStateBlocks() const#

Returns:: [out] Number of constant (non-optimized) state blocks.

State batch types (tables)#

Each state batch type corresponds to a manifold. The table columns are: Plus formula, Ambient dimension, Tangent dimension, Ambient space description, Tangent space description, and Memory layout of one state block in device memory.

SizedStateBatch<AmbientDim, TangentDim>#

Generic base with compile-time ambient and tangent dimensions. Storage layout: contiguous blocks, each of AmbientDim floats. Derived classes implement Plus() for their manifold.

VectorStateBatch<Dim>#

Header: cunls/state/vector_state_batch.h

Euclidean vector state (e.g. landmarks, biases). Tangent and ambient spaces coincide.

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x + \delta\)	\(\mathrm{Dim}\)	\(\mathrm{Dim}\)	\(\mathbb{R}^{\mathrm{Dim}}\)	\(\mathbb{R}^{\mathrm{Dim}}\)	\(\mathrm{Dim}\) floats per block, contiguous

Constructors: Same as SizedStateBatch with both dimensions equal to Dim. See Constructors below.

SO2StateBatch#

Header: cunls/state/so2_state_batch.h

2D rotations (heading angle). Tangent = 1 (angle in radians).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\delta)\)	4	1	2×2 rotation matrix	angle (radians)	row-major 2×2: \([\cos\theta,\, -\sin\theta,\, \sin\theta,\, \cos\theta]\)

SO3StateBatch#

Header: cunls/state/so3_state_batch.h

3D rotations. Tangent = 3 (axis-angle / rotation vector).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\mathrm{skew}(\delta))\)	9	3	3×3 rotation matrix	3D rotation vector	row-major 3×3 (9 floats)

SE2StateBatch#

Header: cunls/state/se2_state_batch.h

2D rigid transform (rotation + translation). Tangent = 3 (\(v_x,\, v_y\), angle).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\delta)\)	9	3	3×3 homogeneous matrix	\([v_x,\, v_y,\, \theta]\)	row-major 3×3: \([\cos\theta,\, -\sin\theta,\, t_x,\, \sin\theta,\, \cos\theta,\, t_y,\, 0,\, 0,\, 1]\)

SE3StateBatch#

Header: cunls/state/se3_state_batch.h

3D rigid transform (rotation + translation). Tangent = 6 (twist: rotation vector + translation).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\mathrm{skew}(\delta))\)	16	6	4×4 homogeneous matrix	6D twist \([\omega; \rho]\)	row-major 4×4: \([R\,\|\,t;\; 0\; 0\; 0\; 1]\) (16 floats)

Similarity2StateBatch#

Header: cunls/state/similarity2_state_batch.h

2D similarity (rotation + translation + scale). Tangent = 4 (\(u_x,\, u_y,\, \theta,\, \lambda=\log s\)).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\delta)\)	9	4	3×3 sim. matrix	\([u_x,\, u_y,\, \theta,\, \lambda]\)	row-major 3×3: \([\cos\theta,\, -\sin\theta,\, t_x,\, \sin\theta,\, \cos\theta,\, t_y,\, 0,\, 0,\, 1/s]\)

Similarity3StateBatch#

Header: cunls/state/similarity3_state_batch.h

3D similarity (rotation + translation + scale). Tangent = 7 (\(\omega,\, u,\, \lambda=\log s\)).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\delta)\)	16	7	4×4 sim. matrix	\([\omega; u; \lambda]\)	row-major 4×4: \([R\,\|\,t;\; 0\; 0\; 0\; 1/s]\) (16 floats)

SL4StateBatch#

Header: cunls/state/sl4_state_batch.h

Projective special linear group SL(4). The tangent space is the 15-dimensional Lie algebra \(\mathfrak{sl}(4)\) (\(\mathfrak{so}(4) \oplus \mathrm{sym\_off}(4) \oplus \mathrm{diag}_0(4)\)).

Plus	Ambient	Tangent	Ambient space	Tangent space	Memory layout
\(x \cdot \mathrm{Exp}(\delta)\)	16	15	4×4 matrix with unit determinant	15D \(\mathfrak{sl}(4)\) Lie algebra	row-major 4×4 (16 floats)

Constructors#

SizedStateBatch<AmbientDim, TangentDim> (constructors)#

SizedStateBatch(const float *device_ptr, size_t num_blocks)#

Parameters:

device_ptr – [in] Device pointer to contiguous state storage (num_blocks × AmbientDim floats).
num_blocks – [in] Number of state blocks.

Returns:

[out] Constructor has no return value.

SizedStateBatch( const float *device_ptr, size_t num_blocks, const int *device_constant_state_ids, size_t num_const_state_blocks )#

Parameters:

device_ptr – [in] Device pointer to contiguous state storage.
num_blocks – [in] Number of state blocks.
device_constant_state_ids – [in] Device pointer to indices of constant blocks.
num_const_state_blocks – [in] Number of constant block indices.

Returns:

[out] Constructor has no return value.

VectorStateBatch<Dim> (constructors)#

Uses the same constructor signatures as SizedStateBatch with ambient and tangent dimension Dim.

StateBatch constructors#

Each StateBatch-derived class has constructors equivalent to:

ClassName( cuBLASHandle &cublas_handle, const float *device_ptr, size_t num_blocks )#

ClassName( cuBLASHandle &cublas_handle, const float *device_ptr, size_t num_blocks, const int *device_constant_state_ids, size_t num_const_state_blocks )#

Parameters:

cublas_handle – [in] External cuBLAS handle wrapper.
device_ptr – [in] Device pointer to contiguous state storage.
num_blocks – [in] Number of state blocks.
device_constant_state_ids – [in] Device pointer to constant block indices.
num_const_state_blocks – [in] Number of constant block indices.

Returns:

[out] Constructor has no return value.

StateBatchOps#

Orchestrates Plus() across multiple state batches: gathers tangent updates from a single reduced vector, scatters to per-batch deltas, and calls each batch’s Plus().

StateBatchOps()#

Returns:: [out] Constructor has no return value.

StateBatchOps( cudaStream_t stream, const std::vector<StateBatch*> &state_batches )#

Parameters:

stream – [in] CUDA stream used to initialize mappings.
state_batches – [in] Ordered list of state batches.

Returns:

[out] Constructor has no return value.

void Preprocess( cudaStream_t stream, const std::vector<StateBatch*> &state_batches )#

Parameters:

stream – [in] CUDA stream for mapping/buffer initialization.
state_batches – [in] State batches used to build reduced/full mappings.

Returns:

[out] No return value.

void Plus( cudaStream_t stream, const std::vector<const float*> &x_ptrs, const DeviceVector<float> &delta, std::vector<float*> &x_plus_delta_ptrs )#

Parameters:

stream – [in] CUDA stream for scatter/update operations.
x_ptrs – [in] Current per-batch state pointers.
delta – [in] Reduced tangent update vector.
x_plus_delta_ptrs – [out] Per-batch pointers for updated states.

Returns:

[out] No return value.

size_t NumReducedStates() const#

Returns:: [out] Number of scalar optimization variables after removing constant states.

Python API (`pycunls`)#

All Python state batches inherit from the abstract StateBatch base class. Every constructor argument documented as DevicePointer accepts either a cupy.ndarray (the device pointer is extracted automatically via .data.ptr) or a raw int GPU device address.

Common `StateBatch` interface#

Every state batch — built-in or user-defined — exposes the following methods and properties.

Methods

state_block_device_ptr(index: int) -> int — returns the GPU device pointer (as an int) for state block index. The returned value is the address of the first float in the block’s ambient storage. Use these pointers to build the state_pointers list passed to Problem.add_factor_batch. index is zero-based; passing a value >= num_state_blocks returns 0 (null pointer).

Read-only properties

num_state_blocks (int) — total number of state blocks in the batch, including any constant blocks.
tangent_size (int) — tangent-space dimension per state block. This is the number of unknowns the solver allocates per block (e.g. 6 for SE(3), 3 for SO(3)).
ambient_size (int) — ambient/storage dimension per state block. The GPU buffer stores num_state_blocks * ambient_size contiguous floats (e.g. 16 for SE(3) = row-major 4×4 matrix).

`pycunls.VectorStateBatch1` / `VectorStateBatch2` / `VectorStateBatch3` / `VectorStateBatch6`#

Euclidean vector states where tangent and ambient dimensions coincide. The suffix indicates the dimension (1, 2, 3, or 6). Plus is simple addition: \(x \oplus \delta = x + \delta\).

Constructors

# All optimizable:
sb = pycunls.VectorStateBatch3(data, num_blocks)

# With constant (frozen) blocks:
sb = pycunls.VectorStateBatch3(data, num_blocks, const_state_ids, num_const)

data (DevicePointer) — contiguous GPU buffer of num_blocks × Dim floats. The state batch does not copy the data; it stores the pointer and reads/writes the buffer directly. The caller must keep the underlying allocation alive for the lifetime of the state batch.
num_blocks (int) — number of state blocks in the batch.
const_state_ids (DevicePointer, optional) — GPU int32 array containing the zero-based indices of blocks that should be held constant during optimization. Constant blocks are excluded from the solver’s tangent vector; their ambient values are never modified.
num_const (int, optional) — number of entries in const_state_ids.

`pycunls.SE3StateBatch`#

3-D rigid-body transform state batch. Ambient = 16 (row-major 4×4 homogeneous matrix), Tangent = 6 (twist \([\omega; \rho]\)). Plus is right-multiplication by the exponential map: \(T \oplus \delta = T \cdot \mathrm{Exp}(\delta)\).

Constructors

cublas = pycunls.CublasHandle()

# All optimizable:
sb = pycunls.SE3StateBatch(cublas, data, num_blocks)

# With constant blocks:
sb = pycunls.SE3StateBatch(cublas, data, num_blocks, const_ids, num_const)

cublas (CublasHandle) — shared cuBLAS handle used internally for matrix operations in the exponential map.
data (DevicePointer) — contiguous GPU buffer of num_blocks × 16 floats (row-major 4×4 matrices).
num_blocks (int) — number of state blocks (poses).
const_ids (DevicePointer, optional) — GPU int32 array of constant-block indices (e.g. a gauge anchor).
num_const (int, optional) — number of constant blocks.

`pycunls.SO3StateBatch`#

3-D rotation state batch. Ambient = 9 (row-major 3×3 rotation matrix), Tangent = 3 (rotation vector / axis-angle). Plus: \(R \oplus \delta = R \cdot \mathrm{Exp}(\mathrm{skew}(\delta))\).