Add MonoDETR monocular 3D object detection support by msmiatac · Pull Request #925 · open-edge-platform/dlstreamer

msmiatac · 2026-06-19T13:20:29Z

Description

Adds support for MonoDETR monocular 3D object detection to DL Streamer by reusing the existing gvadetect element with a new mono3d post-processing converter. The model consumes an image plus camera calibration and the original image size, and produces 2D ROIs annotated with full 3D cuboids (translation, dimensions, orientation).

No new GStreamer element is introduced — gvadetect drives the model, the converter is auto-selected from the model's rt_info (model_type=mono3d), and 3D results flow downstream to gvawatermark3d / gvadeskew via the existing extra_params_json mechanism.

What's included

New mono3d converter (converters/to_roi/mono3d.{h,cpp})
- Subclasses BlobToROIConverter (need_nms=false), decodes the 5 MonoDETR outputs (logits, boxes, 3D dims, depth, angle).
- Resolves outputs by name (pred_logits, pred_3d_dim) and shape; the two [B,Q,3] outputs are disambiguated by the model's output names.
- Performs full 3D lift (sigmoid + top-k scoring, 2D box decode, img_to_rect back-projection, bin-based heading → ry), and emits each detection's 3D cuboid as extra_params_json (translation, dimension, Y-axis rotation quaternion).
Camera-intrinsics handling
- New intrinsics-file property on gva_base_inference (JSON; same schema as gvawatermark3d — supports intrinsic_matrix 3×3, projection_matrix 3×4, optional image_size).
- Shared mono3d_calibration.{h,cpp} helper to parse intrinsics and write P2/orig_width/orig_height onto GStreamer structures.
Auxiliary model-input feeding
- pre_processors.cpp: new calib and img_sizes input feeders.
- inference_impl.cpp: auto-detects the MonoDETR calib ([,3,4]) and img-size ([,2]) inputs by shape and injects their preprocessing descriptors.
- New static ImageInference::GetModelInputShapes (OpenVINO backend) to read input shapes before instantiation.
rt_info / model-proc-free configuration
- model_api_converters.cpp: added a topk parser branch so query-count can be driven from rt_info.
- Calibration is injected into the mono3d post-processor in post_processor.cpp.

brmarkus · 2026-06-19T13:22:45Z

Will you add documentation, samples and examples, too?

msmiatac and others added 2 commits June 19, 2026 15:01

Add monoDETR support

80bc61e

Merge branch 'main' into monoDETR

33d0f8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MonoDETR monocular 3D object detection support#925

Add MonoDETR monocular 3D object detection support#925
msmiatac wants to merge 2 commits into
mainfrom
monoDETR

msmiatac commented Jun 19, 2026

Uh oh!

brmarkus commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

msmiatac commented Jun 19, 2026

Description

What's included

Uh oh!

brmarkus commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants