Publication Date

2023

Journal or Book Title

International Journal of Computer Vision

Abstract

A good understanding of geometrical concepts as well as a broad familiarity with objects lead to excellent human perception of moving objects. The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer and even camouflage. How we perceive moving objects so reliably is a longstanding research question in computer vision and borrows findings from related areas such as psychology, cognitive science and physics. One approach to the problem is to teach a deep network to model all of these effects. This is in contrast with the strategy used by human vision, where cognitive processes and body design are tightly coupled and each is responsible for certain aspects of correctly identifying moving objects. Similarly, from the computer vision perspective there is evidence that classical, geometry-based techniques are better suited to the “motion-based” parts of the problem, while deep networks are more suitable for modeling appearance. In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly. We present a novel probabilistic model to estimate the camera’s rotation given the motion field. We then rectify the flow field to obtain a rotation-compensated motion field for subsequent segmentation. This strategy of first estimating camera motion, and then allowing a network to learn the remaining parts of the problem, yields improved results on the widely used DAVIS benchmark as well as the more recent motion segmentation data set MoCA (Moving Camouflaged Animals).

DOI

https://doi.org/10.1007/s11263-023-01859-x

Volume

1

Issue

1

License

UMass Amherst Open Access Policy

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS