Hi, I'm trying to train with different datasets ,my datasets were consisted of two subsets (agilex ro dataset and UMI datasets). The UMI datasets had only one view and have no joints, so the the model only compute EE joint loss.The depth branch (3D enhancer) and FK loss would be blocked when the UMI data batches were loaded. This mechanism worked fine in single GPU training. However, When I tried to use multiple GPU, there seemed to be NCCL issues. I guess it mainly originates from the different computation graph between heterogeneous datasets. The UMI subsets have no depth and had only one camera view. But the agilex ro dataset have both 3 views and depth branch. How can I fix this problem.
Hi, I'm trying to train with different datasets ,my datasets were consisted of two subsets (agilex ro dataset and UMI datasets). The UMI datasets had only one view and have no joints, so the the model only compute EE joint loss.The depth branch (3D enhancer) and FK loss would be blocked when the UMI data batches were loaded. This mechanism worked fine in single GPU training. However, When I tried to use multiple GPU, there seemed to be NCCL issues. I guess it mainly originates from the different computation graph between heterogeneous datasets. The UMI subsets have no depth and had only one camera view. But the agilex ro dataset have both 3 views and depth branch. How can I fix this problem.