Papers
Topics
Authors
Recent
Search
2000 character limit reached

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection

Published 3 Nov 2024 in cs.CV | (2411.01584v1)

Abstract: The current trend in computer vision is to utilize one universal model to address all various tasks. Achieving such a universal model inevitably requires incorporating multi-domain data for joint training to learn across multiple problem scenarios. In point cloud based 3D object detection, however, such multi-domain joint training is highly challenging, because large domain gaps among point clouds from different datasets lead to the severe domain-interference problem. In this paper, we propose \textbf{OneDet3D}, a universal one-for-all model that addresses 3D detection across different domains, including diverse indoor and outdoor scenes, within the \emph{same} framework and only \emph{one} set of parameters. We propose the domain-aware partitioning in scatter and context, guided by a routing mechanism, to address the data interference issue, and further incorporate the text modality for a language-guided classification to unify the multi-dataset label spaces and mitigate the category interference issue. The fully sparse structure and anchor-free head further accommodate point clouds with significant scale disparities. Extensive experiments demonstrate the strong universal ability of OneDet3D to utilize only one trained model for addressing almost all 3D object detection tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. 3d semantic parsing of large-scale indoor spaces. In CVPR, 2016.
  2. nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
  3. Coda: Collaborative novel box discovery and cross-modal alignment for open-vocabulary 3d object detection. In NeurIPS, 2023.
  4. End-to-end object detection with transformers. In ECCV, 2020.
  5. Focal sparse convolutional networks for 3d object detection. In CVPR, 2022.
  6. Voxelnext: Fully sparse voxelnet for 3d object detection and tracking. In CVPR, 2023.
  7. MMDetection3D Contributors. Mmdetection3d: Openmmlab next-generation platform for general 3d object detection, 2020.
  8. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In CVPR, 2017.
  9. Voxel r-cnn: Towards high performance voxel-based 3d object detection. In AAAI, 2021.
  10. Fsd v2: Improving fully sparse 3d object detection with virtual voxels. arXiv:2308.03755, 2023.
  11. Vision meets robotics: The kitti dataset. IJRR, 2013.
  12. Dataseg: Taming a universal multi-dataset multi-task segmentation model. NeurIPS, 2023.
  13. Generative sparse detection networks for 3d single-shot object detection. In ECCV, 2020.
  14. Mask r-cnn. In ICCV, 2017.
  15. Deep residual learning for image recognition. In CVPR, 2016.
  16. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, 2019.
  17. Unifying voxel-based representation with transformer for 3d object detection. NeurIIPS, 2022.
  18. Feature pyramid networks for object detection. In CVPR, 2017.
  19. Focal loss for dense object detection. In ICCV, 2017.
  20. Multi-space alignments towards universal lidar segmentation. In CVPR, 2024.
  21. Group-free 3d object detection via transformers. In ICCV, 2021.
  22. Decoupled weight decay regularization. ICLR, 2019.
  23. Open-vocabulary point-cloud object detection without 3d annotation. In CVPR, 2023.
  24. Detection hub: Unifying object detection datasets via query adaptation on language embedding. In CVPR, 2023.
  25. Deep hough voting for 3d object detection in point clouds. In ICCV, 2019.
  26. Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017.
  27. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. NeurIPS, 2017.
  28. Learning transferable visual models from natural language supervision. In ICML, 2021.
  29. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
  30. Fcaf3d: fully convolutional anchor-free 3d object detection. In ECCV, 2022.
  31. Improving 3d object detection with channel-wise transformer. In ICCV, 2021.
  32. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In CVPR, 2020.
  33. Pv-rcnn++: Point-voxel feature set abstraction with local vector representation for 3d object detection. IJCV, 2023.
  34. Pointrcnn: 3d object proposal generation and detection from point cloud. In CVPR, 2019.
  35. From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. TPAMI, 2020.
  36. Sun rgb-d: A rgb-d scene understanding benchmark suite. In CVPR, 2015.
  37. Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
  38. Fcos: Fully convolutional one-stage object detection. In ICCV, 2019.
  39. Cagroup3d: Class-aware grouping for 3d object detection on point clouds. NeurIPS, 2022.
  40. Towards universal object detection by domain attention. In CVPR, 2019.
  41. Uni3detr: Unified 3d detection transformer. In NeurIPS, 2023.
  42. Detecting everything in the open world: Towards universal object detection. In CVPR, 2023.
  43. Towards large-scale 3d representation learning with multi-dataset point prompt training. In CVPR, 2024.
  44. Venet: Voting enhancement network for 3d object detection. In ICCV, 2021.
  45. Second: Sparsely embedded convolutional detection. Sensors, 2018.
  46. Swin3d++: Effective multi-source pretraining for 3d indoor scene understanding. arXiv:2402.14215, 2024.
  47. Center-based 3d object detection and tracking. In CVPR, 2021.
  48. Uni3d: A unified baseline for multi-dataset 3d object detection. In CVPR, 2023.
  49. Safdnet: A simple and effective network for fully sparse 3d object detection. In CVPR, 2024.
  50. Pointclip: Point cloud understanding by clip. In CVPR, 2022.
  51. H3dnet: 3d object detection using hybrid geometric primitives. In ECCV, 2020.
  52. Object detection with a unified label space from multiple datasets. In ECCV, 2020.
  53. Iou loss for 2d/3d object detection. In 3DV, 2019.
  54. Detecting twenty-thousand classes using image-level supervision. In ECCV, 2022.
  55. Simple multi-dataset detection. In CVPR, 2022.
  56. Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In CVPR, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.