Motion and Activity Understanding in 360° Videos: An Egocentric Perspective
Activity recognition is one of the popular fields in computer vision research. Though recent development in deep learning-based methodologies has shown tremendous progress in traditional video-based activity recognition, 360° activity recognition is still in its infancy. 360°-based activity recognition imposes challenges like lack of datasets, domain-specific frameworks, and motion understanding in 360° videos. This research focuses on two critical aspects of activity recognition in 360° videos from an egocentric perspective, (i) Egocentric activity recognition and (ii) Motion understanding in 360° videos. Under (i) Egocentric activity recognition, we present two important works, EgoK360 and VIT360. EgoK360 is an egocentric kinetic human activity dataset comprised of human activities and smaller actions from a first-person view. The dataset intends to fill the gap of egocentric video-based activity recognition in 360° videos context. Similarly, VIT360 is a rotation-invariant activity recognition model designed using representation learning techniques and current transformer-based techniques in the literature. Similarly, under (ii) Motion understanding in 360° videos, we present three important works LiteFlowNet360, FLOW360, and SLOF. LiteFlowNet360 is a domain adaptation framework for transferring perspective videos based motion estimation techniques to 360° videos settings. On the other hand, we present FLOW360, a perceptually natural-synthetic optical flow dataset for motion understanding in 360° videos. This dataset is the first in the literature, leveraging several newer opportunities in this domain. Finally, we present SLOF, a siamese representation learning-based framework for motion estimation in 360° videos. We will also discuss several challenges in these areas and the contribution of our work compared with the state-of-the-art frameworks.
Motion understanding, Activity understanding, Egocentric, 360° videos, Optical flow, Omnidirectional flow, Siamese representation learning
Bhandari, K. (2022). <i>Motion and activity understanding in 360° videos: An egocentric perspective</i> (Unpublished dissertation). Texas State University, San Marcos, Texas.