Abstract: Cross-modal training using 2D-3D paired datasets, such as those containing multi-view images and 3D scene scans, presents an effective way to enhance 2D scene understanding by introducing ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results