VEPHand: View-Efficient Photometric Hand Performance Capture at Scale

arXiv Preprint

Google XR

VEPHand is an end-to-end pipeline for high-fidelity dynamic hand performance capture and registration from view-efficient setups (~20 cameras). By combining a mask-free neural reconstruction method with a physics-inspired volumetric registration framework, it robustly captures detailed hand geometry, appearance, and non-linear skin deformations across single-hand, two-hand, and hand-object interaction scenarios.

Pipeline Overview

VEPHand System Pipeline

VEPHand is an end-to-end automated pipeline that captures high-fidelity 3D/4D hand performances and hand-object interactions from practical view-efficient setups (~20 cameras) without requiring templates or intrusive markers.

The pipeline operates in two primary stages:

  • Mask-Free Neural Reconstruction: Overcomes sparse view overlap and background clutter by using a non-SDF density-based representation with scenario-specific density regularization to robustly extract detailed hand geometry and appearance.
  • Physics-Inspired Volumetric Registration: Aligns the neural reconstructions to a parametric hand template by optimizing intrinsic canonical offsets on personalized tetrahedral meshes. This captures fine surface deformations and ensures physically plausible results during self-contact.

Dual-Component Volumetric Deformation Model

Hand Deformation Model

To capture fine-scale dynamic deformations and prevent physical implausibility (like volume loss at joints or self-collisions), VEPHand employs a physics-inspired personalized volumetric deformation framework:

  1. Canonical Volumetric Deformation: Fine-grained, non-linear shape changes are captured by optimizing canonical offsets (Δc) on the template's underlying tetrahedral mesh.
  2. Parametric Surface Refinement: Pose-dependent corrective blendshapes are applied to the canonical surface to correct standard skinning artifacts.
  3. Global Skeletal Articulation: The final articulated hand pose is obtained by deforming the refined canonical surface via Linear Blend Skinning (LBS), driven by skeletal pose parameters (β).

Interactive 3D Visualizer

Interactive 3D reconstruction showing registration results.

BibTeX

If you find our work useful, please cite our paper:

@@inproceedings{todo
}