3DGeoDet: General-purpose Geometry-aware Image-based 3D Object Detection

This paper proposes 3DGeoDet, a novel geometry-aware 3D object detection approach from single- or multi-view RGB images of indoor scenes. The key challenge for image-based 3D object detection tasks is the lack of 3D geometric cues, which leads to ambiguity in establishing correspondences between images and 3D representations. To tackle this problem, 3DGeoDet generates efficient 3D geometric representations in both explicit and implicit manner based on predicted depth information. Specifically, we utilize the predicted depth to learn voxel occupancy and optimize the voxelized 3D feature volume explicitly by the proposed voxel occupancy attention. To further enhance the 3D awareness of the feature volume, we integrate it with an implicit 3D representation, truncated signed distance function. Without supervision from 3D signals, we significantly improve the model’s comprehension of 3D geometry by leveraging intermediate 3D representations and achieve end-to-end training. Our approach surpasses the performance of state-of-the-art image-based methods on both single- and multi-view benchmark datasets, i.e., achieving a 9.3 mAP@0.5 improvement on the SUN RGB-D dataset and a 3.3 mAP@0.5 improvement on the ScanNetV2 dataset. Our image-based method narrows the performance gap compared to the point cloud-based approach, achieving even comparable results.

Our 3DGeoDet achieves state-of-the-art performance on ScanNetV2 benchmark. Specifically, compared to the point cloud-based method VoteNet, our method achieves better results using only 50 views of images. Red indicates the best performance and blue indicates the second best performance.

Our 3DGeoDet outperforms state-of-the-art methods for all number of testing views (Note that we use 20 views for training for all methods). Red indicates the best performance and blue indicates the second best performance.

We visualize several representative examples of 3DGeoDet on ScanNetV2 validation set, covering various indoor environments such as living rooms, bedrooms, bathrooms, kitchens and libraries.

Acknowledgment

The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.

BibTeX 🙏

@ARTICLE{11045444,
  author={Zhang, Yi and Wang, Yi and Cui, Yawen and Chau, Lap-Pui},
  journal={IEEE Transactions on Multimedia}, 
  title={3DGeoDet: General-Purpose Geometry-Aware Image-Based 3D Object Detection}, 
  year={2025},
  volume={27},
  number={},
  pages={6235-6247},
  doi={10.1109/TMM.2025.3581780}}

3DGeoDet

General-purpose Geometry-aware Image-based 3D Object Detection

Accepted by TMM 2025

Abstract

Results on ScanNetV2 Benchmark

ScanNetV2 Visualization Results

ScanNetV2 Visualization Results Compared with SOTA Approaches

Results on SUN RGB-D Benchmark

SUN RGB-D Visualization Results Compared with SOTA Approaches

Acknowledgment

BibTeX 🙏

3DGeoDet General-purpose Geometry-aware Image-based 3D Object Detection

Accepted by TMM 2025

Abstract

Results on ScanNetV2 Benchmark

ScanNetV2 Visualization Results

ScanNetV2 Visualization Results Compared with SOTA Approaches

Results on SUN RGB-D Benchmark

SUN RGB-D Visualization Results Compared with SOTA Approaches

Acknowledgment

BibTeX 🙏

3DGeoDet

General-purpose Geometry-aware Image-based 3D Object Detection