[WIP] Support RLDisaggregated by hhaAndroid · Pull Request #1669 · InternLM/xtuner

hhaAndroid · 2026-04-11T09:30:51Z

No description provided.

hhaAndroid · 2026-04-13T08:58:38Z

claude · 2026-04-13T09:55:28Z

+            rollout_controller=rollout_controller,
+            judger=judger,
+            logger=logger,
+        )
+        produce_strategy = task_cfg.produce_strategy_config.build()
+        sampler = task_cfg.sampler_config.build(tokenizer=tokenizer, replay_buffer=replay_buffer)
+        task_runners.append(
+            _TaskRunner(
+                task_name=task_cfg.task_name,


Claude: Warning: _validate_target_task_counts will raise a raw KeyError if required_task_counts contains a key not present in target_task_counts. This can happen if the caller passes mismatched dicts.

Consider using .get() with an explicit check, or validate matching keys first:

Suggested change

rollout_controller=rollout_controller,

judger=judger,

logger=logger,

)

produce_strategy = task_cfg.produce_strategy_config.build()

sampler = task_cfg.sampler_config.build(tokenizer=tokenizer, replay_buffer=replay_buffer)

task_runners.append(

_TaskRunner(

task_name=task_cfg.task_name,

@staticmethod

def _validate_target_task_counts(

required_task_counts: dict[str, int],

target_task_counts: dict[str, int],

) -> None:

missing_keys = required_task_counts.keys() - target_task_counts.keys()

if missing_keys:

raise ValueError(

f"target_task_counts is missing keys present in required_task_counts: {sorted(missing_keys)}"

)

undersized_targets = {

task_name: {"required": required_task_count, "target": target_task_counts[task_name]}

for task_name, required_task_count in required_task_counts.items()

if target_task_counts[task_name] < required_task_count

}

if undersized_targets:

raise ValueError(

"target_task_batch_sizes must be greater than or equal to required_task_batch_sizes, "

f"got {undersized_targets}"

)

claude · 2026-04-13T09:55:37Z

+                task_name=task_cfg.task_name,
+                agent_loop=agent_loop,
+                produce_strategy=produce_strategy,
+                sampler=sampler,


Claude: Warning (CLAUDE.md): Missing return type annotation. Per the project standards, all new code must include type hints for function signatures.

Suggested change

sampler=sampler,

def _get_shared_rollout_ctl(self, active_tasks: list[_TaskRunner]) -> RolloutController | None:

claude · 2026-04-13T09:55:52Z

+        enable_partial_rollout=enable_partial_rollout,
+    )
+
+    result = ProduceBatchResult(rollout_states=[])


Claude: Warning (CLAUDE.md): Multiple functions and __init__ accept a logger parameter with no type annotation. The project standard requires type hints for all function signatures.

This applies to build_task_runners, build_task_runner, BaseAgentLoopManager.__init__, _produce_single_task_batch, and _produce_single_task_window_to_replay_buffer.

Suggested type: logger: logging.Logger | None = None

claude · 2026-04-13T09:57:38Z

+import json
+import math
+from pathlib import Path
+from typing import Union


Claude: Warning: from typing import Union is imported but Python >= 3.10 uses native str | Path syntax. This import is unnecessary and the usages should be modernized per CLAUDE.md.

claude · 2026-04-13T09:57:53Z

+import ray
+from mmengine.dist import get_rank
+from mmengine.runner import set_random_seed
+from pydantic import BaseModel, ConfigDict


Claude: Warning: typing_extensions.Literal is imported but Literal has been available in typing since Python 3.8. Since the project targets Python >= 3.10, use from typing import Literal instead.

Suggested change

from pydantic import BaseModel, ConfigDict

from typing import Literal

YanhuiDua · 2026-04-14T10:56:54Z

@@ -0,0 +1,560 @@
+# modified from https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/mismatch_helper.py


为啥需要再加一个rollout_is.py，rl/trainer下已经有了rollout_is.py

YanhuiDua · 2026-04-14T10:58:31Z

+    # - disaggregated 需要 trainer 决定 window 节奏，所以 partial_rollout 必须放在 trainer 层
+    #
+    # 因此两边都叫“partial rollout”，但它们不是同一个旋钮，不能强行合并成同一层配置。
+    produce_batch_enable_partial_rollout: bool = False


我觉得这两个参数还是保留之前的叫法

…ated logic to rl_trainer (#8)

hhaAndroid added 5 commits April 10, 2026 13:01

update

9ad8f03

update design

0aaf11b

update

f14f5b4

add comment

a3a8afa

update config

eca3c39

claude bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/manager_base.py Outdated

claude bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xtuner/v1/train/rl_disaggregated_trainer.py Outdated

claude bot reviewed Apr 13, 2026

View reviewed changes

Comment thread xtuner/v1/train/rl_disaggregated_trainer.py Outdated

YanhuiDua reviewed Apr 14, 2026

View reviewed changes

Comment thread xtuner/v1/train/rl_disaggregated_trainer.py Outdated

YanhuiDua reviewed Apr 14, 2026

View reviewed changes

Comment thread xtuner/v1/train/rl_disaggregated_trainer.py Outdated

YanhuiDua reviewed Apr 14, 2026

View reviewed changes

Comment thread xtuner/v1/rl/rollout/utils.py Outdated

YanhuiDua reviewed Apr 14, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/producer.py

hhaAndroid added 6 commits April 15, 2026 02:32

refactor design

7318bfe

Merge upstream/rl_design into add_disagg

ca2cfbf

update

cda0ec7

revert

1aa2f53

remove is

ec9ddf8

Merge upstream/rl_design into add_disagg

beeed7d

YanhuiDua reviewed Apr 15, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/producer.py Outdated

YanhuiDua reviewed Apr 15, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py Outdated

YanhuiDua reviewed Apr 15, 2026

View reviewed changes

Comment thread xtuner/v1/rl/agent_loop/agent_loop_manager.py Outdated

refactor agent loop manager into producer consumer api and mv disaggr…

7ac01a7

…ated logic to rl_trainer (#8)

merge

c0eb269

-            rollout_controller=rollout_controller,
-            judger=judger,
-            logger=logger,
-        )
-        produce_strategy = task_cfg.produce_strategy_config.build()
-        sampler = task_cfg.sampler_config.build(tokenizer=tokenizer, replay_buffer=replay_buffer)
-        task_runners.append(
-            _TaskRunner(
-                task_name=task_cfg.task_name,
+    @staticmethod
+    def _validate_target_task_counts(
+        required_task_counts: dict[str, int],
+        target_task_counts: dict[str, int],
+    ) -> None:
+        missing_keys = required_task_counts.keys() - target_task_counts.keys()
+        if missing_keys:
+            raise ValueError(
+                f"target_task_counts is missing keys present in required_task_counts: {sorted(missing_keys)}"
+            )
+        undersized_targets = {
+            task_name: {"required": required_task_count, "target": target_task_counts[task_name]}
+            for task_name, required_task_count in required_task_counts.items()
+            if target_task_counts[task_name] < required_task_count
+        }
+        if undersized_targets:
+            raise ValueError(
+                "target_task_batch_sizes must be greater than or equal to required_task_batch_sizes, "
+                f"got {undersized_targets}"
+            )

	sampler=sampler,
	def _get_shared_rollout_ctl(self, active_tasks: list[_TaskRunner]) -> RolloutController \| None:

	from pydantic import BaseModel, ConfigDict
	from typing import Literal

		@@ -0,0 +1,560 @@
		# modified from https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/mismatch_helper.py

Conversation

hhaAndroid commented Apr 11, 2026

Uh oh!

hhaAndroid commented Apr 13, 2026

Uh oh!

claude bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

claude bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YanhuiDua Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YanhuiDua Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants