Skip to content

Add MonoDETR monocular 3D object detection support#925

Draft
msmiatac wants to merge 2 commits into
mainfrom
monoDETR
Draft

Add MonoDETR monocular 3D object detection support#925
msmiatac wants to merge 2 commits into
mainfrom
monoDETR

Conversation

@msmiatac

Copy link
Copy Markdown
Contributor

Description

Adds support for MonoDETR monocular 3D object detection to DL Streamer by reusing the existing gvadetect element with a new mono3d post-processing converter. The model consumes an image plus camera calibration and the original image size, and produces 2D ROIs annotated with full 3D cuboids (translation, dimensions, orientation).

No new GStreamer element is introduced — gvadetect drives the model, the converter is auto-selected from the model's rt_info (model_type=mono3d), and 3D results flow downstream to gvawatermark3d / gvadeskew via the existing extra_params_json mechanism.

What's included

  • New mono3d converter (converters/to_roi/mono3d.{h,cpp})

    • Subclasses BlobToROIConverter (need_nms=false), decodes the 5 MonoDETR outputs (logits, boxes, 3D dims, depth, angle).
    • Resolves outputs by name (pred_logits, pred_3d_dim) and shape; the two [B,Q,3] outputs are disambiguated by the model's output names.
    • Performs full 3D lift (sigmoid + top-k scoring, 2D box decode, img_to_rect back-projection, bin-based heading → ry), and emits each detection's 3D cuboid as extra_params_json (translation, dimension, Y-axis rotation quaternion).
  • Camera-intrinsics handling

    • New intrinsics-file property on gva_base_inference (JSON; same schema as gvawatermark3d — supports intrinsic_matrix 3×3, projection_matrix 3×4, optional image_size).
    • Shared mono3d_calibration.{h,cpp} helper to parse intrinsics and write P2/orig_width/orig_height onto GStreamer structures.
  • Auxiliary model-input feeding

    • pre_processors.cpp: new calib and img_sizes input feeders.
    • inference_impl.cpp: auto-detects the MonoDETR calib ([,3,4]) and img-size ([,2]) inputs by shape and injects their preprocessing descriptors.
    • New static ImageInference::GetModelInputShapes (OpenVINO backend) to read input shapes before instantiation.
  • rt_info / model-proc-free configuration

    • model_api_converters.cpp: added a topk parser branch so query-count can be driven from rt_info.
    • Calibration is injected into the mono3d post-processor in post_processor.cpp.

@brmarkus

Copy link
Copy Markdown

Will you add documentation, samples and examples, too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants