feat: add as_generic_chat_history function to convert any Context to …#1007
feat: add as_generic_chat_history function to convert any Context to …#1007akihikokuroda wants to merge 6 commits intogenerative-computing:mainfrom
Conversation
…Message Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
| return c.parsed_repr | ||
| # Use value if parsed_repr is None or not a Message | ||
| content = ( | ||
| str(c.value) if c.parsed_repr is None else formatter(c.parsed_repr) |
There was a problem hiding this comment.
I think we also need to consider the case if c.parsed_repr is a string (I believe that's what a MOT will return after a completed CBlock action (?) )
quick way to test:
mot = ModelOutputThunk(value='reply text')
mot.parsed_repr = 'reply text'
as_generic_chat_history(ChatContext().add(Message('user','hi')).add(mot))
# → WARNING: Unknown component type str ...
There was a problem hiding this comment.
Yes, it was an issue. I fix it. Thanks!
There was a problem hiding this comment.
should we add a test case for it too?
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
there's a typo in the PR title: |
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
|
||
| def as_generic_chat_history( | ||
| ctx: Context, formatter: Callable[[Any], str] | None = None | ||
| ) -> list[Message]: |
There was a problem hiding this comment.
Minor / style: Small typing issue here — both _default_formatter and the formatter parameter use Any, which the project convention asks us to avoid unless an external library forces it (nothing does here). object is the right type: it accepts everything, is covariant, and keeps mypy happy with callers that pass typed formatters.
def _default_formatter(obj: object) -> str: ...
formatter: Callable[[object], str] | None = NoneThe test helpers already use obj: object, so it is just the library code that needs updating.
| # Use value if parsed_repr is None or some other type | ||
| content = ( | ||
| str(c.value) if c.parsed_repr is None else formatter(c.parsed_repr) | ||
| ) |
There was a problem hiding this comment.
Two things worth noting on this line — one minor, one more significant.
Minor / style: When parsed_repr is a dict or similar, the default formatter will log "Unknown component type dict" — but dict is not an unexpected component here, it is a parsed form of a ModelOutputThunk we already recognise. That message will be confusing to anyone monitoring logs. A dedicated log line would be clearer:
_logger.warning(
"ModelOutputThunk.parsed_repr is %s, not a Message; falling back to value.",
type(c.parsed_repr).__name__,
)More significant: If both parsed_repr and value are None (a ModelOutputThunk that has not been evaluated yet), str(None) silently produces the literal string "None" in the chat history with no warning logged. A backend receiving that will have no indication anything went wrong. Worth guarding explicitly:
if c.value is None:
raise ValueError("ModelOutputThunk has no value and no parsed_repr — was it evaluated?")| return Message(role="assistant", content=content) | ||
| case CBlock(): | ||
| return Message(role="user", content=str(c)) | ||
| case _: |
There was a problem hiding this comment.
More significant: One tricky edge case: ImageBlock inherits from CBlock (see core/base.py:79), so it will fall into this arm and produce Message(role="user", content=str(image_block)) — meaning a raw base64 PNG string ends up in the chat history as a text message. This is a reachable code path if a context with image content is passed in, and the corruption would be invisible downstream.
A simple guard fixes it:
case CBlock():
if type(c) is not CBlock:
return Message(role="user", content=formatter(c))
return Message(role="user", content=str(c))That way any CBlock subclasses not explicitly handled go through the formatter instead.
planetf1
left a comment
There was a problem hiding this comment.
Generally looks good - just came across 2 possible corruption issues. 1 is an edge case, the other affects images. 2 other minors.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Co-authored-by: Paul Schweigert <paul@paulschweigert.com>
markstur
left a comment
There was a problem hiding this comment.
No blockers from me unless you agree to make formatter more general purpose.
Bob had some suggestions below. I'd probably do #2, but optional/FYI --
Suggestions (Optional Improvements)
- Type Narrowing: Consider using typing.TypeGuard for _to_msg to help type checkers understand the return type better.
- Performance: The nested _to_msg function is recreated on every call. Consider making it a module-level function if performance matters.
- Consistency with as_chat_history(): Both functions raise the same generic Exception for non-linear contexts. Consider extracting this to a shared helper or using a custom exception type.
| Returns: | ||
| List of ``Message`` objects in conversation order. | ||
|
|
||
| Raises: |
There was a problem hiding this comment.
also:
ValueError: If a ModelOutputThunk has neither value nor parsed_repr.
| @@ -0,0 +1,85 @@ | |||
| # pytest: unit | |||
There was a problem hiding this comment.
I thought we default to unit and don't put this in, but at the moment I'm not sure if that applies to docs/examples.
There was a problem hiding this comment.
pytest: unit in docs/examples/as_generic_chat_history.py is correct per AGENTS.md — examples require this opt-in comment to be collected.
|
|
||
| def _to_msg(c: CBlock | Component | ModelOutputThunk) -> Message: | ||
| match c: | ||
| case Message(): |
There was a problem hiding this comment.
as a naive user I expected to be able to format a Message.
I might be totally wrong about the use case, but hard-coding a "format=" to be so specialized seems odd. Why doesn't the formatter work on the prepared final message/content instead of only certain types of content.
There was a problem hiding this comment.
Formatting apply to the Message content itself would be a breaking change and might not be the intended use case.
|
|
||
| all_ctx_events = ctx.as_list() | ||
| if all_ctx_events is None: | ||
| raise Exception("Trying to cast a non-linear history into a chat history.") |
| assert len(history) == 2 | ||
| assert history[0].role == "user" | ||
| assert history[0].content == "inline content" | ||
|
|
There was a problem hiding this comment.
Bob suggested test (needs testing):
Missing test: The CBlock subclass path (lines 347-348) isn't explicitly tested. The test at line 357-365 uses a plain CBlock, not a subclass.
Recommendation: Add a test:
def test_as_generic_chat_history_with_cblock_subclass():
"""CBlock subclasses use the formatter."""
class CustomCBlock(CBlock):
pass
ctx = ChatContext()
ctx = ctx.add(CustomCBlock("custom content"))
history = as_generic_chat_history(ctx)
assert len(history) == 1
assert history[0].role == "user"
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Misc PR
Type of PR
Description
Add as_generic_chat_history() function to handle heterogeneous contexts with
configurable formatter. Unlike as_chat_history(), this gracefully handles
arbitrary component types by converting them to strings, with optional logging
for unknown types.
Testing
Attribution