Impact of pseudo depth on open world object segmentation with minimal user guidance

  • Pseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training. This renders our object segmentation task an open world task. The pseudo depth maps are generated using pretrained networks, which have either been trained with the full intention to generalize to downstream tasks (LeRes and MiDaS), or which have been trained in an unsupervised fashion on video sequences (MonodepthV2). In order to tell our network which object to segment, we provide the network with a single click on the object's surface on the pseudo depth map of the image as input. We test our approach on two different scenarios: One without the RGB image and one where the RGB image is part of the input. Our results demonstrate a considerably better generalization performance from seen to unseen object types when depth is used. On the Semantic Boundaries Dataset wePseudo depth maps are depth map predicitions which are used as ground truth during training. In this paper we leverage pseudo depth maps in order to segment objects of classes that have never been seen during training. This renders our object segmentation task an open world task. The pseudo depth maps are generated using pretrained networks, which have either been trained with the full intention to generalize to downstream tasks (LeRes and MiDaS), or which have been trained in an unsupervised fashion on video sequences (MonodepthV2). In order to tell our network which object to segment, we provide the network with a single click on the object's surface on the pseudo depth map of the image as input. We test our approach on two different scenarios: One without the RGB image and one where the RGB image is part of the input. Our results demonstrate a considerably better generalization performance from seen to unseen object types when depth is used. On the Semantic Boundaries Dataset we achieve an improvement from 61.57 to 69.79 IoU score on unseen classes, when only using half of the training classes during training and performing the segmentation on depth maps only.show moreshow less

Download full text files

  • 103573.pdfeng
    (1032KB)

    Postprint. © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Robin SchönGND, Katja LudwigGND, Rainer LienhartGND
URN:urn:nbn:de:bvb:384-opus4-1035735
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/103573
ISBN:979-8-3503-0249-3OPAC
Parent Title (English):2023 IEEE/CVF International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 17 2023 to June 24 2023, Vancouver, BC, Canada
Publisher:IEEE
Place of publication:Piscataway, NJ
Type:Conference Proceeding
Language:English
Year of first Publication:2023
Publishing Institution:Universität Augsburg
Release Date:2023/04/14
First Page:4809
Last Page:4819
DOI:https://doi.org/10.1109/CVPRW59228.2023.00509
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Lehrstuhl für Maschinelles Lernen und Maschinelles Sehen
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Licence (German):Deutsches Urheberrecht