Learning Correspondences For Relative Pose Estimation

We present an end-to-end learnable, differentiable method for pairwise relative pose registration of RGB-D frames. Our method is robust to big camera motions thanks to a self-supervised weighting of the predicted correspondences between the frames. Given a pair of frames, our method estimates matches of points and their visibility score. A self-supervised model predicts a confidence weight for visible matches. Finally, visible matches and their weight are fed into a differentiable weighted Procrustes aligner which estimates the rigid transformation between the input frames.

Components of the network
Pipeline of our method. Given a pair of RGB-D images, $I1, D1 I2, D2$, we estimate the relative pose between these frames as $R \in SO(3)$ and $t \in R^3$ . First, $I1, I2$ are fed into the Correspondence and visibility prediction component, the visible predicted correspondences are weighted in the Correspondence Weighting component. Finally, they are back-projected into 3D and feed into the Weighted Procrustes aligner which estimates the relative pose.

Marc Benedí San Millán
Marc Benedí San Millán
PhD Candidate @ Visual Computing Group

My research interests include Computer Vision, Computer Graphics and Deep Learning.