Best practices for reproducing EgoDex pretraining pipeline

Hi HoloBrain team,

We noticed that HoloBrain utilizes the EgoDex dataset during its pretraining phase. Given the growing popularity of ego-centric data, we are planning to reproduce this pretraining stage. To align with the data format, we are trying to bypass the Forward Kinematics (FK) pipeline and directly feed [bs, 2, 8] pose data ([openness, x, y, z, qx, qy, qz, qw]) into the model.

We have made the following modifications to the pipeline and would like to verify if this matches your official EgoDex pretraining setup.

🛠️ Our Modifications
1.Dataset Config: Added an egodex_pretrain configuration dict in dataset_config. We set num_joint = 2 and reduced the scale_shift list to exactly 2 entries.
<img width="753" height="971" alt="Image" src="https://github.com/user-attachments/assets/608e9c2d-c16c-4572-b5d0-4ee78c9f0180" />
2.Bypassing FK via Custom Transform: Instead of DualArmKinematics, we implemented a custom EgodexDirectPose class in transforms.py to bypass FK operations during data processing.
This class reshapes the raw 16D EgoDex input into [..., 2, 8], aligns openness to channel 0, and directly populates data["robot_state"].
joint_relative_pos is set to return torch.zeros(2, 2).

<img width="753" height="971" alt="Image" src="https://github.com/user-attachments/assets/11ece5ee-2e2c-4047-9639-d76c3c8e950c" />

<img width="837" height="1057" alt="Image" src="https://github.com/user-attachments/assets/baa9ff58-bb92-49e9-993c-0994570d709c" />

3.Diffusion Noise Strategy: We changed noise_type from "local_joint" to "local_joint_local_pose". This forces the diffusion model to add noise to and predict all 8 columns simultaneously, thereby avoiding the need for the recompute fallback.
4.Loss Function: To prevent accidental FK calls during loss computation, we removed fk_loss_weight and are relying solely on state_loss_weights.
❓ Our Uncertainties
While this pipeline successfully starts training, we have a couple of questions:

1Pipeline Alignment: Are the modifications above the standard approach you used for EgoDex pretraining? Are there any other hidden or critical configurations we might have missed?
2The recompute mechanism: In our EgodexDirectPose class, we maintained the joint_state_to_robot_state interface (if the model returns [Batch, 2, 1], we pad the remaining dimensions with zeros to reach [Batch, 2, 8] and set quaternion $Qw = 1.0$ to prevent NaN crashes). Under the noise_type="local_joint_local_pose" setting, is there still any chance this recompute padding function gets triggered? Or is it perfectly safe to comment it out/remove it?
Thanks in advance for your time and guidance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for reproducing EgoDex pretraining pipeline #52

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Best practices for reproducing EgoDex pretraining pipeline #52

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions