Skip to content

Make unroll_length schedulable#1833

Merged
QuantuMope merged 1 commit into
pytorchfrom
PR/andrew/schedulable-unroll-length
May 27, 2026
Merged

Make unroll_length schedulable#1833
QuantuMope merged 1 commit into
pytorchfrom
PR/andrew/schedulable-unroll-length

Conversation

@QuantuMope

@QuantuMope QuantuMope commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

This PR allows for a scheduled unroll length if we are running synced off-policy RL training:

  1. async_unroll=False
  2. whole_replay_buffer_training=False

It also allows for a scheduled value of 0, which in turn skips unrolling to train from the replay buffer.

This allows us to "simulate" very diverse training strategies. E.g.,

unroll_length = StepScheduler("iterations", [
    # unroll to collect an "offline" dataset
    (1, int(initial_collect_steps / num_para_envs)),        
    # perform offline training iterations. no unroll 
    (offline_training_iters, 0),     
    # continue with online RL                                    
    (offline_training_iters + 1, desired_unroll_length)])

Codex cleverly makes a minimal change with full backward compatibility by adding the following code

    @property
    def unroll_length(self):
        return self._unroll_length()

    @unroll_length.setter
    def unroll_length(self, value):
        self._unroll_length = as_scheduler(value)

Comment thread alf/algorithms/config.py
self.unroll_with_grad = unroll_with_grad
self.use_root_inputs_for_after_train_iter = use_root_inputs_for_after_train_iter
self.async_unroll = async_unroll
if not isinstance(self._unroll_length, ConstantScheduler):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantScheduler --> should check against a base class, e.g. Scheduler?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check against ConstantScheduler here because a scalar input will be converted to one before this check due to the setter function on line 479.

Any non-constant scheduler should then raise an error if we're doing on-policy or async unroll.

@QuantuMope QuantuMope left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Haichao, responded to your comment. Let me know if I misunderstood it.

Comment thread alf/algorithms/config.py
self.unroll_with_grad = unroll_with_grad
self.use_root_inputs_for_after_train_iter = use_root_inputs_for_after_train_iter
self.async_unroll = async_unroll
if not isinstance(self._unroll_length, ConstantScheduler):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check against ConstantScheduler here because a scalar input will be converted to one before this check due to the setter function on line 479.

Any non-constant scheduler should then raise an error if we're doing on-policy or async unroll.

@QuantuMope

Copy link
Copy Markdown
Contributor Author

Gentle reminder for review. Thanks

@QuantuMope QuantuMope merged commit 8954c95 into pytorch May 27, 2026
2 checks passed
@QuantuMope QuantuMope deleted the PR/andrew/schedulable-unroll-length branch May 27, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants