Problem
_handle_note and _handle_call_api in phone_agent/actions/handler.py are no-op placeholders:
def _handle_note(self, action, width, height):
return ActionResult(True, False) # placeholder
def _handle_call_api(self, action, width, height):
return ActionResult(True, False) # placeholder
The system prompt instructs the model to use Note to record page content and Call_API to summarize:
- do(action="Note", message="True")
记录当前页面内容以便后续总结。
- do(action="Call_API", instruction="xxx")
总结或评论当前页面或已记录的内容。
But neither action actually does anything. Tasks like "browse multiple restaurant listings and summarize the top recommendations" silently fail.
Proposed Fix
- Update system prompt so the model writes page content into Note's
message field (currently it passes "True" as a static value)
- Implement Note/Call_API handlers with a lightweight in-memory buffer in
agent.py
- Inject Call_API results into the next step's user message (preserving the user/assistant alternation pattern)
Scope
- 4 files changed:
actions/handler.py, agent.py, prompts_zh.py, prompts_en.py
- ~95 lines total, no new files
- No change to the main agent loop architecture
- No extra model calls per step (only Call_API triggers one sub-request)
- Backward compatible: default callbacks are no-ops
I have a detailed implementation plan ready and would like to submit a PR for this. Let me know if this direction makes sense.
Problem
_handle_noteand_handle_call_apiinphone_agent/actions/handler.pyare no-op placeholders:The system prompt instructs the model to use Note to record page content and Call_API to summarize:
But neither action actually does anything. Tasks like "browse multiple restaurant listings and summarize the top recommendations" silently fail.
Proposed Fix
messagefield (currently it passes"True"as a static value)agent.pyScope
actions/handler.py,agent.py,prompts_zh.py,prompts_en.pyI have a detailed implementation plan ready and would like to submit a PR for this. Let me know if this direction makes sense.