Skip to content

Best practices for reproducing EgoDex pretraining pipeline #52

@Helloseldon

Description

@Helloseldon

Hi HoloBrain team,

We noticed that HoloBrain utilizes the EgoDex dataset during its pretraining phase. Given the growing popularity of ego-centric data, we are planning to reproduce this pretraining stage. To align with the data format, we are trying to bypass the Forward Kinematics (FK) pipeline and directly feed [bs, 2, 8] pose data ([openness, x, y, z, qx, qy, qz, qw]) into the model.

We have made the following modifications to the pipeline and would like to verify if this matches your official EgoDex pretraining setup.

🛠️ Our Modifications
1.Dataset Config: Added an egodex_pretrain configuration dict in dataset_config. We set num_joint = 2 and reduced the scale_shift list to exactly 2 entries.
Image
2.Bypassing FK via Custom Transform: Instead of DualArmKinematics, we implemented a custom EgodexDirectPose class in transforms.py to bypass FK operations during data processing.
This class reshapes the raw 16D EgoDex input into [..., 2, 8], aligns openness to channel 0, and directly populates data["robot_state"].
joint_relative_pos is set to return torch.zeros(2, 2).

Image Image

3.Diffusion Noise Strategy: We changed noise_type from "local_joint" to "local_joint_local_pose". This forces the diffusion model to add noise to and predict all 8 columns simultaneously, thereby avoiding the need for the recompute fallback.
4.Loss Function: To prevent accidental FK calls during loss computation, we removed fk_loss_weight and are relying solely on state_loss_weights.
❓ Our Uncertainties
While this pipeline successfully starts training, we have a couple of questions:

1Pipeline Alignment: Are the modifications above the standard approach you used for EgoDex pretraining? Are there any other hidden or critical configurations we might have missed?
2The recompute mechanism: In our EgodexDirectPose class, we maintained the joint_state_to_robot_state interface (if the model returns [Batch, 2, 1], we pad the remaining dimensions with zeros to reach [Batch, 2, 8] and set quaternion $Qw = 1.0$ to prevent NaN crashes). Under the noise_type="local_joint_local_pose" setting, is there still any chance this recompute padding function gets triggered? Or is it perfectly safe to comment it out/remove it?
Thanks in advance for your time and guidance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions