Factor API#

The factor module provides batched residual and Jacobian models for non-linear least squares. Each factor computes a residual vector from one or more state blocks. State blocks lie on manifolds; the factor API uses tangent-space dimensions for Jacobian layout and solver variables. Links to the corresponding state batch types are in the Factor inputs section and in each factor’s Inputs subsection.

C++cunls/factor
Pythonpycunls

C++ API#

Factor inputs#

Each factor’s Inputs subsection below and the State API API list the required state batch types (e.g. VectorStateBatch<Dim>, SO3StateBatch).

FactorBatch#

Abstract base (cunls/factor/factor_batch.h).

\[r = f(x),\qquad J = \frac{\partial f}{\partial x}\]
bool Evaluate(
float *residuals,
float *jacobians,
float const *const *state_pointers,
cudaStream_t stream
) const#
Parameters:
  • residuals – [out] Residual output buffer.

  • jacobians – [out] Optional Jacobian output buffer (nullptr to skip).

  • state_pointers – [in] Device pointer array mapping factor inputs to state blocks.

  • stream – [in] CUDA stream for asynchronous execution.

Returns:

[out] true on success.

size_t ResidualsSize() const#
Returns:

[out] Residual dimension per factor.

std::vector<size_t> StateBlockSizes() const#
Returns:

[out] State block tangent dimensions consumed by each factor.

size_t NumFactors() const#
Returns:

[out] Number of factors in the batch.

SizedFactorBatch<kResidualSize, …kStateBlockSizes>#

Compile-time convenience base (cunls/factor/sized_factor_batch.h) that fixes residual and state (tangent) dimensions at compile time.

Each specialization exposes sized_layout — an alias for the same SizedFactorBatch<kResidualSize, kStateBlockSizes...> type. Wrapper templates such as InformationFactorBatch<T> and WeightedFactorBatch<T> inherit public T::sized_layout so they remain full SizedFactorBatch instances with the same layout as the inner batch T.

PriorVectorFactorBatch<Dim>#

Header: cunls/factor/prior_vector_factor_batch.h

Prior on a Euclidean vector (e.g. bias, landmark). Pulls the state toward observed values.

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = x - o\)

\(\mathrm{Dim}\)

\(I\)

\(\mathrm{Dim} \times \mathrm{Dim}\)

\(\mathbb{R}^{\mathrm{Dim}}\)

Inputs: \(x\) = state vector, \(o\) = observation (constructor). State: one block from VectorStateBatch<Dim> (see State API).

Constructor:

PriorVectorFactorBatch(const Vector<Dim>* observations_ptr, size_t num_factors)
  • observations_ptr — [in] Device pointer to observed vectors.

  • num_factors — [in] Number of factors in this batch.

SO2PriorFactorBatch#

Header: cunls/factor/so2_prior_factor_batch.h

Prior on a 2D rotation (e.g. heading). Penalizes deviation from a target rotation.

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(R_{\mathrm{target}}^\top R)\)

1

\(1\)

\(1 \times 1\)

SO(2)

Inputs: \(R\) = current rotation (state). State: one block from SO2StateBatch (see State API).

SO2PriorFactorBatch(
const Matrix<2> *observations_ptr,
size_t num_factors
)#
Parameters:
  • observations_ptr – [in] Device pointer to SO(2) observations (2×2 row-major).

  • num_factors – [in] Number of factors.

Returns:

Constructor has no return value.

SO3PriorFactorBatch#

Header: cunls/factor/so3_prior_factor_batch.h

Prior on a 3D rotation. Penalizes deviation from a target orientation.

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(R_{\mathrm{target}}^\top R)\)

3

\(J_r^{-1}(r)\)

\(3 \times 3\)

SO(3)

Inputs: \(R\) = current rotation (state). State: one block from SO3StateBatch (see State API).

SO3PriorFactorBatch(
const Matrix<3> *observations_ptr,
size_t num_factors
)#
Parameters:
  • observations_ptr – [in] Device pointer to SO(3) observations (3×3 row-major).

  • num_factors – [in] Number of factors.

Returns:

Constructor has no return value.

SE2PriorFactorBatch#

Header: cunls/factor/se2_prior_factor_batch.h

Prior on 2D rigid transform. State: one block from SE2StateBatch (see State API).

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(T_{\mathrm{target}}^{-1} T)\)

3

\(J_r^{-1}(r)\)

\(3 \times 3\)

SE(2)

SE3PriorFactorBatch#

Header: cunls/factor/se3_prior_factor_batch.h

Prior on 3D rigid transform. State: one block from SE3StateBatch (see State API).

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(T_{\mathrm{target}}^{-1} T)\)

6

\(J_r^{-1}(r)\)

\(6 \times 6\)

SE(3)

Similarity2PriorFactorBatch#

Header: cunls/factor/similarity2_prior_factor_batch.h

Prior on 2D similarity transform. State: one block from Similarity2StateBatch (see State API).

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(T_{\mathrm{target}}^{-1} T)\)

4

\(J_r^{-1}(r)\)

\(4 \times 4\)

Sim(2)

Similarity3PriorFactorBatch#

Header: cunls/factor/similarity3_prior_factor_batch.h

Prior on 3D similarity transform. State: one block from Similarity3StateBatch (see State API).

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(T_{\mathrm{target}}^{-1} T)\)

7

\(J_r^{-1}(r)\)

\(7 \times 7\)

Sim(3)

Constructors (all four prior classes above):

ClassName(const ObsType *observations_ptr, size_t num_factors)#
Parameters:
  • observations_ptr – [in] Device pointer to observation transforms.

  • num_factors – [in] Number of factors.

Returns:

Constructor has no return value.

SL4PriorFactorBatch#

Header: cunls/factor/sl4_prior_factor_batch.h

Prior on an SL(4) transform. State: one block from SL4StateBatch (see State API).

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(r = \mathrm{Log}(T_{\mathrm{target}}^{-1} T)\)

15

\(I\)

\(15 \times 15\)

SL(4)

SL4PriorFactorBatch(
const SL4Transform *observations_ptr,
size_t num_factors
)#
Parameters:
  • observations_ptr – [in] Device pointer to SL(4) target transforms (row-major 4×4).

  • num_factors – [in] Number of factors.

Returns:

Constructor has no return value.

SE3BetweenFactorBatch#

Header: cunls/factor/se3_between_factor_batch.h

Constrains the relative pose between two SE(3) frames (e.g. odometry, loop closure).

\[r = \mathrm{Log}\bigl( \Delta^{-1} \, T_{\mathrm{left}}^{-1} \, T_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

6

left/right SE(3) Jacobians

\(6 \times 12\)

SE(3) × SE(3)

Inputs: \(T_{\mathrm{left}}\), \(T_{\mathrm{right}}\) = two poses (state blocks). State: two blocks from SE3StateBatch (see State API). \(\Delta\) = measured relative transform (constructor).

SE3BetweenFactorBatch(
const SE3Transform *pose_deltas_ptr,
size_t num_factors
)#
Parameters:
  • pose_deltas_ptr – [in] Device pointer to measured relative transforms.

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

SE2BetweenFactorBatch#

Header: cunls/factor/se2_between_factor_batch.h

Constrains the relative transform between two SE(2) frames.

\[r = \mathrm{Log}\bigl( \Delta^{-1} \, T_{\mathrm{left}}^{-1} \, T_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

3

left/right SE(2) Jacobians

\(3 \times 6\)

SE(2) × SE(2)

Inputs: \(T_{\mathrm{left}}\), \(T_{\mathrm{right}}\) = two poses (state blocks). State: two blocks from SE2StateBatch (see State API). \(\Delta\) = measured relative transform (constructor).

SE2BetweenFactorBatch(
const Matrix<3> *pose_deltas_ptr,
size_t num_factors
)#
Parameters:
  • pose_deltas_ptr – [in] Device pointer to measured relative transforms (row-major 3×3).

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

SO2BetweenFactorBatch#

Header: cunls/factor/so2_between_factor_batch.h

Constrains the relative rotation between two SO(2) frames.

\[r = \mathrm{Log}\bigl( \Delta^{\top} \, R_{\mathrm{left}}^{\top} \, R_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

1

left/right SO(2) Jacobians

\(1 \times 2\)

SO(2) × SO(2)

Inputs: \(R_{\mathrm{left}}\), \(R_{\mathrm{right}}\) = two rotations (state blocks). State: two blocks from SO2StateBatch (see State API). \(\Delta\) = measured relative rotation (constructor).

SO2BetweenFactorBatch(
const Matrix<2> *rotation_deltas_ptr,
size_t num_factors
)#
Parameters:
  • rotation_deltas_ptr – [in] Device pointer to measured relative rotations (row-major 2×2).

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

SO3BetweenFactorBatch#

Header: cunls/factor/so3_between_factor_batch.h

Constrains the relative rotation between two SO(3) frames.

\[r = \mathrm{Log}\bigl( \Delta^{\top} \, R_{\mathrm{left}}^{\top} \, R_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

3

left/right SO(3) Jacobians

\(3 \times 6\)

SO(3) × SO(3)

Inputs: \(R_{\mathrm{left}}\), \(R_{\mathrm{right}}\) = two rotations (state blocks). State: two blocks from SO3StateBatch (see State API). \(\Delta\) = measured relative rotation (constructor).

SO3BetweenFactorBatch(
const Matrix<3> *rotation_deltas_ptr,
size_t num_factors
)#
Parameters:
  • rotation_deltas_ptr – [in] Device pointer to measured relative rotations (row-major 3×3).

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

Similarity2BetweenFactorBatch#

Header: cunls/factor/similarity2_between_factor_batch.h

Constrains the relative transform between two Sim(2) frames.

\[r = \mathrm{Log}\bigl( \Delta^{-1} \, T_{\mathrm{left}}^{-1} \, T_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

4

left/right Sim(2) Jacobians

\(4 \times 8\)

Sim(2) × Sim(2)

Inputs: \(T_{\mathrm{left}}\), \(T_{\mathrm{right}}\) = two transforms (state blocks). State: two blocks from Similarity2StateBatch (see State API). \(\Delta\) = measured relative transform (constructor).

Similarity2BetweenFactorBatch(
const Matrix<3> *pose_deltas_ptr,
size_t num_factors
)#
Parameters:
  • pose_deltas_ptr – [in] Device pointer to measured relative transforms (row-major 3×3).

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

Similarity3BetweenFactorBatch#

Header: cunls/factor/similarity3_between_factor_batch.h

Constrains the relative transform between two Sim(3) frames.

\[r = \mathrm{Log}\bigl( \Delta^{-1} \, T_{\mathrm{left}}^{-1} \, T_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

7

left/right Sim(3) Jacobians

\(7 \times 14\)

Sim(3) × Sim(3)

Inputs: \(T_{\mathrm{left}}\), \(T_{\mathrm{right}}\) = two transforms (state blocks). State: two blocks from Similarity3StateBatch (see State API). \(\Delta\) = measured relative transform (constructor).

Similarity3BetweenFactorBatch(
cuBLASHandle &cublas_handle,
const Matrix<4> *pose_deltas_ptr,
size_t num_factors
)#
Parameters:
  • cublas_handle – [in] External cuBLAS handle wrapper.

  • pose_deltas_ptr – [in] Device pointer to measured relative transforms (row-major 4×4).

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

SL4BetweenFactorBatch#

Header: cunls/factor/sl4_between_factor_batch.h

Constrains the relative transform between two SL(4) frames.

\[r = \mathrm{Log}\bigl( \Delta^{-1} \, T_{\mathrm{left}}^{-1} \, T_{\mathrm{right}} \bigr)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

15

left/right SL(4) Jacobians

\(15 \times 30\)

SL(4) × SL(4)

Inputs: \(T_{\mathrm{left}}\), \(T_{\mathrm{right}}\) = two transforms (state blocks). State: two blocks from SL4StateBatch (see State API). \(\Delta\) = measured relative transform (constructor).

SL4BetweenFactorBatch(
const SL4Transform *pose_deltas_ptr,
size_t num_factors
)#
Parameters:
  • pose_deltas_ptr – [in] Device pointer to measured relative transforms (row-major 4×4, unit determinant).

  • num_factors – [in] Number of between constraints.

Returns:

Constructor has no return value.

VectorBetweenFactorBatch<Dim>#

Header: cunls/factor/vector_between_factor_batch.h

Constrains the difference between two Euclidean vector states.

\[r = x_{\mathrm{left}} - x_{\mathrm{right}} - \delta\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

\(x_l - x_r - \delta\)

\(\mathrm{Dim}\)

\([I \;|\; {-}I]\)

\(\mathrm{Dim} \times 2\mathrm{Dim}\)

\(\mathbb{R}^{\mathrm{Dim}} \times \mathbb{R}^{\mathrm{Dim}}\)

Inputs: \(x_l\), \(x_r\) = two vector states. State: two blocks from VectorStateBatch<Dim> (see State API). \(\delta\) = measured difference (constructor).

Constructor:

VectorBetweenFactorBatch(const Vector<Dim>* deltas_ptr, size_t num_factors)
  • deltas_ptr — [in] Device pointer to measured difference vectors.

  • num_factors — [in] Number of factors in this batch.

ReprojectionFactorBatch#

Header: cunls/factor/reprojection_factor_batch.h

Reprojection error for bundle adjustment. Observations in normalized image coordinates.

\[\begin{split}P_{\mathrm{cam}} = T_{\mathrm{cam}} P,\qquad r = \begin{bmatrix} P_{\mathrm{cam},x}/P_{\mathrm{cam},z} - x_n \\ P_{\mathrm{cam},y}/P_{\mathrm{cam},z} - y_n \end{bmatrix}\end{split}\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

2

chain rule on projection

\(2 \times 9\)

SE(3) × \(\mathbb{R}^3\)

Inputs: Pose \(T_{\mathrm{cam}}\) (state block 1), 3D point \(P\) (state block 2). State: SE3StateBatch then VectorStateBatch<3> (see State API). Observations \((x_n, y_n)\) and optional camera-from-rig from constructor.

ReprojectionFactorBatch(
const Vector<2> *observations,
size_t num_observations,
float z_threshold = 1e-3f
)#
ReprojectionFactorBatch(
const Vector<2> *observations,
const SE3Transform *poses_camera_from_rig,
size_t num_observations,
float z_threshold = 1e-3f
)#
Parameters:
  • observations – [in] Device pointer to normalized observations.

  • poses_camera_from_rig – [in] Optional device pointer to camera extrinsics (second overload).

  • num_observations – [in] Number of reprojection factors.

  • z_threshold – [in] Minimum valid depth.

Returns:

Constructor has no return value.

PnPFactorBatch#

Header: cunls/factor/pnp_factor_batch.h

Fixed-structure Perspective-n-Point reprojection: the same normalized pinhole residual as ReprojectionFactorBatch, but each 3D landmark is held in device memory passed to the constructor (not a state variable). Only the SE(3) pose is optimized; the analytic Jacobian is therefore \(2 \times 6\).

\[\begin{split}P_{\mathrm{cam}} = T_{\mathrm{cam}\leftarrow\mathrm{world}}\, P_{\mathrm{world}},\qquad r = \begin{bmatrix} P_{\mathrm{cam},x}/P_{\mathrm{cam},z} - x_n \\ P_{\mathrm{cam},y}/P_{\mathrm{cam},z} - y_n \end{bmatrix}\end{split}\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(same pinhole model as reprojection)

2

pose chain rule only

\(2 \times 6\)

SE(3)

Inputs: Normalized observations \((x_n,y_n)\) and matching world points \(P_{\mathrm{world}}\) from the constructor (one pair per factor). State: a single SE3StateBatch block per factor (rig-from-world pose, or world-to-camera according to your convention—match how you built the observations). Optional poses_camera_from_rig uses the same composition as ReprojectionFactorBatch.

PnPFactorBatch(
const Vector<2> *observations,
const Vector<3> *points_world,
size_t num_observations,
float z_threshold = 1e-3f
)#
PnPFactorBatch(
const Vector<2> *observations,
const SE3Transform *poses_camera_from_rig,
const Vector<3> *points_world,
size_t num_observations,
float z_threshold = 1e-3f
)#
Parameters:
  • observations – [in] Device pointer to normalized 2-D observations.

  • points_world – [in] Device pointer to fixed world points \(P\).

  • poses_camera_from_rig – [in] Optional per-factor rig extrinsics (second overload).

  • num_observations – [in] Number of PnP correspondences.

  • z_threshold – [in] Minimum valid camera-frame depth.

Returns:

Constructor has no return value.

PointToPointFactorBatch#

Header: cunls/factor/point_to_point_factor_batch.h

Point cloud registration (e.g. ICP). Residual = target point minus transformed source point.

\[r = p - T q = p - (R q + t)\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

3

\([\partial r/\partial\omega\;\partial r/\partial\rho] = [R[q]_\times\;{-}R]\)

\(3 \times 6\)

SE(3)

Inputs: \(T\) = pose (state). State: one block from SE3StateBatch (see State API). \(p\), \(q\) = target/source points (constructor).

PointToPointFactorBatch(
const Vector<3> *p_observations_ptr,
const Vector<3> *q_observations_ptr,
size_t num_factors
)#
Parameters:
  • p_observations_ptr – [in] Device pointer to target points.

  • q_observations_ptr – [in] Device pointer to source points.

  • num_factors – [in] Number of correspondences.

Returns:

Constructor has no return value.

PointToPlaneFactorBatch#

Header: cunls/factor/point_to_plane_factor_batch.h

Plane-based ICP: signed distance from transformed source point to target plane.

\[r = n_q^\top (p - T q) = n_q \cdot (p - (R q + t))\]

With \(n' = R^\top n_q\), the Jacobian row is \([n'^\top [q]_\times,\; -n'^\top]\).

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

1

\([n'^\top [q]_\times\;{-}n'^\top]\)

\(1 \times 6\)

SE(3)

Inputs: \(T\) = pose (state). State: one block from SE3StateBatch (see State API). \(p\), \(q\), \(n_q\) = target point, source point, source normal (constructor).

PointToPlaneFactorBatch(
const Vector<3> *p_observations_ptr,
const Vector<3> *q_observations_ptr,
const Vector<3> *nq_observations_ptr,
size_t num_factors
)#
Parameters:
  • p_observations_ptr – [in] Device pointer to target points.

  • q_observations_ptr – [in] Device pointer to source points.

  • nq_observations_ptr – [in] Device pointer to source normals.

  • num_factors – [in] Number of correspondences.

Returns:

Constructor has no return value.

SymmetricPointToPlaneFactorBatch#

Header: cunls/factor/symmetric_point_to_plane_factor_batch.h

Symmetric point-to-plane: both frames contribute normals; \(N = n_p + n_q\).

\[r = N^\top \bigl( T p - T^{-1} q \bigr) = \bigl( (R p + t) - R^\top(q - t) \bigr)^\top N\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

1

\(\mathrm{d}(T p,\, T^{-1} q)\) w.r.t. \(T\)

\(1 \times 6\)

SE(3)

Inputs: \(T\) = pose (state). State: one block from SE3StateBatch (see State API). \(p\), \(n_p\), \(q\), \(n_q\) = target/source points and normals (constructor).

SymmetricPointToPlaneFactorBatch(
const Vector<3> *p_observations_ptr,
const Vector<3> *q_observations_ptr,
const Vector<3> *np_observations_ptr,
const Vector<3> *nq_observations_ptr,
size_t num_factors
)#
Parameters:
  • p_observations_ptr – [in] Device pointer to target points.

  • q_observations_ptr – [in] Device pointer to source points.

  • np_observations_ptr – [in] Device pointer to target normals.

  • nq_observations_ptr – [in] Device pointer to source normals.

  • num_factors – [in] Number of correspondences.

Returns:

Constructor has no return value.

InformationFactorBatch<T>#

Header: cunls/factor/information_factor_batch.h

Inheritance: class InformationFactorBatch : public T::sized_layout — i.e. the same SizedFactorBatch<kResidualSize, ...> as the wrapped type T. Residual and state-block sizes come from that base; this class adds NumFactors, storage for T, and an Evaluate that applies \(\Omega^{1/2}\) after the inner factor.

T must derive from some SizedFactorBatch (see type trait IsDerivedFromAnySizedFactorBatch).

Wraps a factor to apply a square-root information matrix \(\Omega^{1/2}\) (e.g. from measurement covariance).

\[r_{\mathrm{weighted}} = \Omega^{1/2} r,\qquad J_{\mathrm{weighted}} = \Omega^{1/2} J\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

(per \(T\))

(see above)

(per \(T\))

same as wrapped \(T\)

Inputs: Same state layout as the wrapped factor T. Constructor also takes sqrt_information_matrices_ptr (per-factor \(\Omega^{1/2}\)).

template<class ...Args>
InformationFactorBatch(
cuBLASHandle &cublas_handle,
const Matrix<T::residual_size_> *sqrt_information_matrices_ptr,
size_t num_matrices,
Args&&... sized_factor_batch_args
)#
Parameters:
  • cublas_handle – [in] External cuBLAS handle wrapper.

  • sqrt_information_matrices_ptr – [in] Device pointer to per-factor square-root information matrices.

  • num_matrices – [in] Number of square-root information matrices; must equal T::NumFactors() after T is constructed.

  • sized_factor_batch_args – [in] Constructor arguments forwarded to wrapped factor T (same order as T’s constructor — include a leading cuBLASHandle when T requires one, or (weight, …) when T is WeightedFactorBatch<U>).

Returns:

Constructor has no return value.

WeightedFactorBatch<T>#

Header: cunls/factor/weighted_factor_batch.h

Inheritance: class WeightedFactorBatch : public T::sized_layout (same SizedFactorBatch specialization as T), with NumFactors and Evaluate extended for scalar weighting.

T must derive from SizedFactorBatch.

Wraps a factor to apply scalar weight(s) to residuals and Jacobians. Supports two modes: a single uniform weight applied to every factor, or per-factor weights from a device array.

\[r_{\mathrm{weighted}} = w \, r,\qquad J_{\mathrm{weighted}} = w \, J\]

Residual

Residual dim

Jacobian

Jacobian dims

Manifold

(see above)

(per \(T\))

(see above)

(per \(T\))

same as wrapped \(T\)

Inputs: Same state layout as the wrapped factor T. Constructor also takes either a single float weight or a const float* device pointer to per-factor weights.

template<class ...Args>
WeightedFactorBatch(
float weight,
Args&&... sized_factor_batch_args
)#

Uniform weight constructor. Multiplies every factor’s residual and Jacobian by the same scalar weight. The batch size is the inner factor’s NumFactors().

Parameters:
  • weight – [in] Scalar weight applied to all factors.

  • sized_factor_batch_args – [in] Constructor arguments forwarded to wrapped factor T.

Returns:

Constructor has no return value.

template<class ...Args>
WeightedFactorBatch(
const float *per_factor_weights,
size_t num_weights,
Args&&... sized_factor_batch_args
)#

Per-factor weight constructor. Factor i has its residual and Jacobian multiplied by per_factor_weights[i].

Parameters:
  • per_factor_weights – [in] Device pointer to per-factor weights (at least num_weights floats).

  • num_weights – [in] Number of weights; must equal T::NumFactors() for the constructed inner batch.

  • sized_factor_batch_args – [in] Constructor arguments forwarded to wrapped factor T.

Returns:

Constructor has no return value.

Python API (pycunls)#

All Python factor batches inherit from the abstract FactorBatch base class. Every constructor argument documented as DevicePointer accepts either a cupy.ndarray (the device pointer is extracted automatically via .data.ptr) or a raw int GPU device address.

The residual formulas, Jacobian structure, and state layouts are identical to the C++ versions documented in the sections above — this section focuses on the Python constructor signatures, methods, and properties.

Common FactorBatch interface#

Every factor batch — built-in or user-defined — exposes the following read-only properties and methods.

Read-only properties

  • num_factors (int) — number of factor instances in the batch.

  • residuals_size (int) — residual dimension per factor (e.g. 2 for ReprojectionFactorBatch, 6 for SE3BetweenFactorBatch).

Methods

  • state_block_sizes() -> list[int] — returns a list of tangent-space dimensions for each state block consumed by one factor. For example, ReprojectionFactorBatch returns [6, 3] (SE(3) pose then \(\mathbb{R}^3\) point), and PnPFactorBatch returns [6] (pose only; 3-D points are fixed in the constructor).

pycunls.PriorVectorFactorBatch1 / PriorVectorFactorBatch2 / PriorVectorFactorBatch3 / PriorVectorFactorBatch6#

Prior on a Euclidean vector. Residual = \(x - o\) with identity Jacobian. The suffix indicates the dimension.

Constructor

fb = pycunls.PriorVectorFactorBatch3(observations, num_factors)
  • observations (DevicePointer) — contiguous GPU buffer of num_factors × Dim floats holding the observed (target) vectors. The factor batch does not copy the data; the caller must keep the allocation alive.

  • num_factors (int) — number of prior factors.

State layout: one block per factor from the corresponding VectorStateBatch (see pycunls.VectorStateBatch1 / VectorStateBatch2 / VectorStateBatch3 / VectorStateBatch6).

pycunls.SO2PriorFactorBatch#

Prior on a 2-D rotation. Residual = \(\mathrm{Log}(R_\mathrm{target}^\top R)\). Does not require a CublasHandle.

Constructor

fb = pycunls.SO2PriorFactorBatch(observations, num_factors)
  • observations (DevicePointer) — num_factors × 4 floats holding row-major 2×2 target rotation matrices.

  • num_factors (int) — number of prior factors.

State layout: one block per factor from SO2StateBatch (see pycunls.SE3StateBatch).

pycunls.SO3PriorFactorBatch#

Prior on a 3-D rotation. Residual = \(\mathrm{Log}(R_\mathrm{target}^\top R)\), Jacobian = \(J_r^{-1}(r)\). Does not require a CublasHandle.

Constructor

fb = pycunls.SO3PriorFactorBatch(observations, num_factors)
  • observations (DevicePointer) — num_factors × 9 floats holding row-major 3×3 target rotation matrices.

  • num_factors (int) — number of prior factors.

State layout: one block per factor from SO3StateBatch.

pycunls.SE3PriorFactorBatch#

Prior on a 3-D rigid transform. Residual = \(\mathrm{Log}(T_\mathrm{target}^{-1} T)\), Jacobian = \(J_r^{-1}(r)\). Does not require a CublasHandle.

Constructor

fb = pycunls.SE3PriorFactorBatch(observations, num_factors)
  • observations (DevicePointer) — num_factors × 16 floats holding row-major 4×4 target homogeneous matrices.

  • num_factors (int) — number of prior factors.

State layout: one block per factor from SE3StateBatch.

pycunls.SL4PriorFactorBatch#

Prior on an SL(4) transform. Residual = \(\mathrm{Log}(T_\mathrm{target}^{-1} T)\). Does not require a CublasHandle.

Constructor

fb = pycunls.SL4PriorFactorBatch(observations, num_factors)
  • observations (DevicePointer) — num_factors × 16 floats holding row-major 4×4 SL(4) target transforms.

  • num_factors (int) — number of prior factors.

State layout: one block per factor from SL4StateBatch.

pycunls.SE3BetweenFactorBatch#

Constrains the relative pose between two SE(3) frames. Residual = \(\mathrm{Log}(\Delta^{-1} T_l^{-1} T_r)\). Two state blocks per factor.

Constructor

fb = pycunls.SE3BetweenFactorBatch(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × 16 floats holding row-major 4×4 measured relative transforms \(\Delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [T_left, T_right] — both from SE3StateBatch. The state-pointer list must therefore contain 2 × num_factors entries.

pycunls.SE2BetweenFactorBatch#

Constrains the relative transform between two SE(2) frames. Residual = \(\mathrm{Log}(\Delta^{-1} T_l^{-1} T_r)\). Two state blocks per factor.

Constructor

fb = pycunls.SE2BetweenFactorBatch(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × 9 floats holding row-major 3×3 measured relative transforms \(\Delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [T_left, T_right] — both from SE2StateBatch.

pycunls.SO2BetweenFactorBatch#

Constrains the relative rotation between two SO(2) frames. Residual = \(\mathrm{Log}(\Delta^\top R_l^\top R_r)\). Two state blocks per factor.

Constructor

fb = pycunls.SO2BetweenFactorBatch(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × 4 floats holding row-major 2×2 measured relative rotations \(\Delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [R_left, R_right] — both from SO2StateBatch.

pycunls.SO3BetweenFactorBatch#

Constrains the relative rotation between two SO(3) frames. Residual = \(\mathrm{Log}(\Delta^\top R_l^\top R_r)\). Two state blocks per factor.

Constructor

fb = pycunls.SO3BetweenFactorBatch(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × 9 floats holding row-major 3×3 measured relative rotations \(\Delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [R_left, R_right] — both from SO3StateBatch.

pycunls.Similarity2BetweenFactorBatch#

Constrains the relative transform between two Sim(2) frames. Residual = \(\mathrm{Log}(\Delta^{-1} T_l^{-1} T_r)\). Two state blocks per factor.

Constructor

fb = pycunls.Similarity2BetweenFactorBatch(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × 9 floats holding row-major 3×3 measured relative transforms \(\Delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [T_left, T_right] — both from Similarity2StateBatch.

pycunls.Similarity3BetweenFactorBatch#

Constrains the relative transform between two Sim(3) frames. Residual = \(\mathrm{Log}(\Delta^{-1} T_l^{-1} T_r)\). Two state blocks per factor.

Constructor

fb = pycunls.Similarity3BetweenFactorBatch(cublas, deltas, num_factors)
  • cublas (CublasHandle) — shared cuBLAS handle.

  • deltas (DevicePointer) — num_factors × 16 floats holding row-major 4×4 measured relative transforms \(\Delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [T_left, T_right] — both from Similarity3StateBatch.

pycunls.SL4BetweenFactorBatch#

Constrains the relative transform between two SL(4) frames. Residual = \(\mathrm{Log}(\Delta^{-1} T_l^{-1} T_r)\). Two state blocks per factor.

Constructor

fb = pycunls.SL4BetweenFactorBatch(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × 16 floats holding row-major 4×4 measured relative transforms \(\Delta\) (unit determinant).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor — [T_left, T_right] — both from SL4StateBatch.

pycunls.VectorBetweenFactorBatch1 / VectorBetweenFactorBatch2 / VectorBetweenFactorBatch3 / VectorBetweenFactorBatch6#

Between factor on Euclidean vectors. Residual = \(x_l - x_r - \delta\). Two state blocks per factor.

Constructor

fb = pycunls.VectorBetweenFactorBatch3(deltas, num_factors)
  • deltas (DevicePointer) — num_factors × Dim floats holding the measured difference vectors \(\delta\).

  • num_factors (int) — number of between constraints.

State layout: two blocks per factor from the corresponding VectorStateBatch (see pycunls.VectorStateBatch1 / VectorStateBatch2 / VectorStateBatch3 / VectorStateBatch6).

pycunls.ReprojectionFactorBatch#

Reprojection error for bundle adjustment. Observations must be in normalized image coordinates (intrinsic calibration already applied): \(r = \pi(T, P) - z\).

Constructor

fb = pycunls.ReprojectionFactorBatch(
    observations, num_observations, z_threshold=1e-3)
  • observations (DevicePointer) — num_observations × 2 floats holding normalized 2-D observations \((x_n, y_n)\).

  • num_observations (int) — number of reprojection factors.

  • z_threshold (float, default 1e-3) — minimum valid depth \(z\) in camera frame. Points with \(z < z_\text{threshold}\) produce zero residuals and Jacobians to avoid singularities.

State layout: two blocks per factor — [SE3 pose, R^3 point] — from SE3StateBatch and VectorStateBatch3 respectively.

pycunls.PnPFactorBatch#

PnP-style reprojection: fixed 3-D points in the constructor, one SE(3) state per correspondence (typically the same camera pose pointer repeated).

Constructor (identity camera-from-rig)

fb = pycunls.PnPFactorBatch(
    observations, points_world, num_observations, z_threshold=1e-3)

Constructor (with camera-from-rig extrinsics per factor)

fb = pycunls.PnPFactorBatch(
    observations, poses_camera_from_rig, points_world,
    num_observations, z_threshold=1e-3)
  • observationsnum_observations × 2 normalized image coordinates.

  • points_worldnum_observations × 3 fixed world points (not optimized).

  • poses_camera_from_rignum_observations × 16 row-major SE(3) matrices (optional overload).

  • z_threshold — minimum valid depth in the camera frame (same role as ReprojectionFactorBatch).

State layout: one SE3StateBatch block per factor from SE3StateBatch.

pycunls.PointToPointFactorBatch#

Point-to-point ICP factor. Residual = \(p - T q\).

Constructor

fb = pycunls.PointToPointFactorBatch(p_observations, q_observations, num_factors)
  • p_observations (DevicePointer) — num_factors × 3 floats holding target points \(p\).

  • q_observations (DevicePointer) — num_factors × 3 floats holding source points \(q\).

  • num_factors (int) — number of point correspondences.

State layout: one block per factor from SE3StateBatch.

pycunls.PointToPlaneFactorBatch#

Point-to-plane ICP factor. Residual = \(n_q^\top (p - T q)\).

Constructor

fb = pycunls.PointToPlaneFactorBatch(
    p_observations, q_observations, nq_observations, num_factors)
  • p_observations (DevicePointer) — target points (× 3 floats).

  • q_observations (DevicePointer) — source points (× 3 floats).

  • nq_observations (DevicePointer) — source normals (× 3 floats).

  • num_factors (int) — number of correspondences.

State layout: one block per factor from SE3StateBatch.

pycunls.SymmetricPointToPlaneFactorBatch#

Symmetric point-to-plane ICP factor. Both frames contribute normals; \(N = n_p + n_q\).

Constructor

fb = pycunls.SymmetricPointToPlaneFactorBatch(
    p_observations, q_observations,
    np_observations, nq_observations, num_factors)
  • p_observations (DevicePointer) — target points (× 3 floats).

  • q_observations (DevicePointer) — source points (× 3 floats).

  • np_observations (DevicePointer) — target normals (× 3 floats).

  • nq_observations (DevicePointer) — source normals (× 3 floats).

  • num_factors (int) — number of correspondences.

State layout: one block per factor from SE3StateBatch.

pycunls.InformationFactorBatch#

Wraps any factor batch and left-multiplies residuals and Jacobians by per-factor square-root information matrices \(\Omega^{1/2}\). Unlike the C++ template, the Python class accepts any FactorBatch — no template specialization is needed.

C++ contrast: The C++ template InformationFactorBatch<T> also inherits T::sized_layout (a SizedFactorBatch with the same compile-time layout as T). The Python wrapper is a dynamic FactorBatch only.

Constructor

info_fb = pycunls.InformationFactorBatch(
    cublas, inner_factor, sqrt_information_matrices)
  • cublas (CublasHandle) — shared cuBLAS handle.

  • inner_factor (FactorBatch) — the factor batch to wrap. The wrapper delegates Evaluate to this factor first, then applies the information matrices. The inner factor must be kept alive for the lifetime of the wrapper.

  • sqrt_information_matrices (DevicePointer) — num_factors × residual_size × residual_size contiguous floats holding one row-major square-root information matrix per factor.

Example

inner = pycunls.SE3BetweenFactorBatch(deltas, N)
info  = pycunls.InformationFactorBatch(cublas, inner, sqrt_info_gpu)

problem.add_factor_batch(info, state_pointers)

pycunls.WeightedFactorBatch#

Wraps any factor batch and scales residuals and Jacobians by a scalar weight. Two construction modes are supported:

  1. Uniform weight (float) — the same scalar is applied to every factor.

  2. Per-factor weights (DevicePointer) — one weight per factor from a GPU array.

C++ contrast: WeightedFactorBatch<T> inherits T::sized_layout; the Python wrapper subclasses FactorBatch only.

Constructors

# Uniform weight
wfb = pycunls.WeightedFactorBatch(inner_factor, weight=2.0)

# Per-factor weights
wfb = pycunls.WeightedFactorBatch(inner_factor, weights=weights_gpu)
  • inner_factor (FactorBatch) — the factor batch to wrap.

  • weight (float) — uniform scalar weight applied to all factors.

  • weights (DevicePointer) — num_factors contiguous floats, one weight per factor.

Exactly one of weight or weights must be provided.

Example

inner = pycunls.PriorVectorFactorBatch3(obs_gpu, N)
wfb   = pycunls.WeightedFactorBatch(inner, weight=5.0)

problem.add_factor_batch(wfb, state_pointers)

pycunls.CustomFactorBatch#

Base class for user-defined factors. Subclass this to implement a residual and Jacobian computation that is not available as a built-in factor.

Constructor

class MyFactor(pycunls.CustomFactorBatch):
    def __init__(self, num_factors):
        super().__init__(
            residual_size=...,
            state_block_sizes=[...],
            num_factors=num_factors,
        )
  • residual_size (int) — dimension of the residual vector per factor.

  • state_block_sizes (Sequence[int]) — list of tangent-space dimensions for each state block consumed by one factor (e.g. [1, 1] for a factor reading two scalar states).

  • num_factors (int) — number of factor instances.

Methods to override

  • evaluate(residuals_ptr, jacobians_ptr, state_pointers_ptr, stream_handle) -> bool — computes residuals and Jacobians on the GPU for all factors in the batch. All four arguments are raw int handles:

    • residuals_ptr — device pointer to the output residual buffer. Layout: num_factors × residual_size contiguous floats.

    • jacobians_ptr — device pointer to the output Jacobian buffer. Layout: num_factors × residual_size × sum(state_block_sizes) contiguous floats (row-major per factor, blocks concatenated in state order). May be 0 (null) when the minimizer only needs residuals (e.g. for cost evaluation); in that case skip Jacobian writes.

    • state_pointers_ptr — device pointer to an array of float* pointers. The array has num_factors × len(state_block_sizes) entries. Each entry is the device address of the ambient-space storage for one (factor, state-block) pair, in row-major order. Because Warp kernels cannot perform float** double-pointer indirection, custom factors typically gather state values into contiguous CuPy arrays before launching a kernel (see the Custom Warp Factor tutorial).

    • stream_handlecudaStream_t cast to int. All GPU work must be launched on this stream.

    Return True on success. The default implementation raises NotImplementedError.

pycunls.warp.WarpFactorBatch#

Convenience base for custom factors implemented with NVIDIA Warp kernels. Inherits from CustomFactorBatch and provides helper methods for zero-copy pointer wrapping so you never need to manually construct wp.array objects from raw device addresses. Requires warp-lang.

Constructor

from pycunls.warp import WarpFactorBatch

class MyWarpFactor(WarpFactorBatch):
    def __init__(self, num_factors):
        super().__init__(
            residual_size=...,
            state_block_sizes=[...],
            num_factors=num_factors,
            device="cuda:0",
        )
  • device (str, default "cuda:0") — Warp device string used when creating wp.array wrappers via wrap_array.

Helper methods (inherited — do not override)

  • wrap_array(ptr: int, dtype, shape) -> wp.array — zero-copy wrap of an existing GPU allocation as a Warp array. ptr is the device address, dtype a Warp data type (e.g. wp.float32), and shape an int or tuple giving the array dimensions. The returned wp.array shares the memory; no allocation or copy occurs.

  • make_warp_stream(stream_handle: int) -> wp.Stream — wraps a raw cudaStream_t (passed as int) as a wp.Stream. Use the returned stream in wp.launch(..., stream=stream) to ensure the Warp kernel executes on the minimizer’s CUDA stream.

Methods to override

  • evaluate(residuals_ptr, jacobians_ptr, state_pointers_ptr, stream_handle) -> bool — same contract as CustomFactorBatch.evaluate. Typical implementations:

    1. Gather scattered state pointers into contiguous CuPy arrays (using a CuPy RawKernel or cp.ndarray indexing).

    2. Wrap the contiguous arrays and output buffers with self.wrap_array.

    3. Build a wp.Stream with self.make_warp_stream.

    4. Launch a @wp.kernel on that stream.

See Custom Warp Factor for a complete example.