📄 MoBind: Motion Binding for Fine-Grained IMU-Video Pose Alignment
👥 Authors: Duc Duy Nguyen, Tat-Jun Chin, Minh Hoai
📅 Published: February 22, 2026
🔥 Upvotes: 0
🎯 What This Research Is About
MoBind introduces a hierarchical contrastive learning framework that creates a unified representation between IMU (Inertial Measurement Unit) sensor signals and video-based pose sequences. This enables accurate cross-modal retrieval, precise temporal synchronization, subject identification, body-part localization, and action recognition.
The framework addresses three key challenges: filtering visual noise from backgrounds, modeling multi-sensor IMU configurations effectively, and achieving fine-grained sub-second temporal alignment between sensor data and video.
💡 Why This Matters
- Precision Motion Tracking: By aligning IMU signals with skeletal motion sequences rather than raw pixels, MoBind isolates motion-relevant cues and achieves unprecedented temporal precision at the sub-second level.
- Body-Part Awareness: The system decomposes full-body motion into local trajectories for different body parts, pairing each with its corresponding IMU sensor for semantically meaningful alignment.
- Real-World Applications: This technology has immediate applications in sports analytics, rehabilitation monitoring, VR/AR systems, and human-computer interaction where precise motion understanding is critical.
- Superior Performance: Evaluated on mRi, TotalCapture, and EgoHumans datasets, MoBind outperforms existing baselines across cross-modal retrieval, temporal synchronization, localization, and action recognition tasks.
Curated from Hugging Face daily papers on 2026-02-25