Skip to content

fix: preserve pending messages when LLM returns non-XML or error responses#1643

Closed
gchaix wants to merge 6 commits intothedotmack:mainfrom
gchaix:fix/preserve-failed-observations
Closed

fix: preserve pending messages when LLM returns non-XML or error responses#1643
gchaix wants to merge 6 commits intothedotmack:mainfrom
gchaix:fix/preserve-failed-observations

Conversation

@gchaix
Copy link
Copy Markdown

@gchaix gchaix commented Apr 7, 2026

processAgentResponse() called confirmProcessed() (which DELETEs the pending_messages row) regardless of whether the LLM actually returned parseable XML. Any unexpected response — auth errors, rate limits, empty responses, garbled output — silently deleted the queued tool interactions with zero observations stored.

This was observed in production when stale auth tokens caused the Claude SDK to return 401 Invalid bearer token as response text. The parser found no XML, the warning was logged, and then confirmProcessed() deleted the queue entry.

What Changed

ResponseProcessor.ts — the core fix:

  • Added isRateLimitResponse() and isAuthErrorResponse() classification helpers
  • Added preserveMessages() — calls markFailed() on all in-flight message IDs, routing them through the existing retry logic (up to 3 retries, then permanently failed but visible in the queue UI) rather than silently deleting them
  • processAgentResponse() now returns ProcessAgentResponseResult (status: ok | rate_limited | error | empty) instead of void
  • All non-XML paths now return early via preserveMessages():
    • Empty response with pending messages → error
    • Rate-limit text → rate_limited
    • Auth error text → error
    • Unknown non-XML → error

Agent call sites — all 6 processAgentResponse calls updated:

  • SDKAgent.ts — captures result, aborts session on rate_limited
  • GeminiAgent.ts — captures result at all 3 call sites; replaces the "leave for stale recovery" pattern (5-min wait) with immediate markFailed() calls
  • OpenRouterAgent.ts — captures result at both call sites, aborts on rate_limited

types.ts — adds ProcessAgentResponseResult interface

index.ts — exports ProcessAgentResponseResult, isRateLimitResponse, isAuthErrorResponse

Tests

24 tests pass (bun test tests/worker/agents/response-processor.test.ts).

Updated existing tests:

  • Non-XML response test now asserts markFailed() is called and storeObservations() is not called

New tests:

  • Rate-limit text → markFailed() called, status rate_limited
  • Auth error text → markFailed() called, status error
  • Empty response with pending messages → markFailed() called, status error
  • Empty response with no pending messages (init prompt) → storeObservations() called normally, status empty
  • Plain-text non-XML with pending messages → markFailed() called, status error
  • isRateLimitResponse() unit tests
  • isAuthErrorResponse() unit tests

Existing Safety Properties Preserved

  • confirmProcessed() is still called only after successful storeObservations() on the happy path
  • Init prompts (no pending messages) still call storeObservations() with 0 observations — no session timestamp side-effects
  • markFailed() retries up to 3× then marks permanently failed (already the behaviour for rate-limit errors in OpenRouter batch processing)
  • The startup self-heal (resetStaleProcessingMessages(0)) and the 5-minute stale claim reset are unchanged

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CLI tools to inspect, filter, delete, and requeue failed messages from the queue.
  • Bug Fixes

    • Improved handling of rate-limited and authentication error responses with proper session cleanup.
    • Empty observations are now skipped during processing.
    • Added conversation history rollback on error and rate-limit responses.
  • Documentation

    • Clarified summary prompt instructions regarding observation output handling.
  • Chores

    • Enhanced tag stripping to remove task notification blocks from responses.

Walkthrough

Call sites now await and inspect a structured ProcessAgentResponseResult from processAgentResponse. The processor classifies plain-text rate-limit/auth errors, preserves or fails pending messages, and returns detailed statuses so agents can abort, rollback, or continue without committing assistant text on early/error paths.

Changes

Cohort / File(s) Summary
Agent callers
src/services/worker/GeminiAgent.ts, src/services/worker/OpenRouterAgent.ts, src/services/worker/SDKAgent.ts
Callers now store processAgentResponse results and branch on status. Agents abort or return early on rate_limited, rollback recent conversation entries on rate_limited/error, and avoid committing assistant text on early/error outcomes.
Response processing core
src/services/worker/agents/ResponseProcessor.ts, src/services/worker/agents/types.ts, src/services/worker/agents/index.ts
processAgentResponse returns ProcessAgentResponseResult (added type), introduces isRateLimitResponse/isAuthErrorResponse, implements preserveMessages/mark-failed behavior, distinguishes skipped/empty/error/rate_limited/ok, and avoids updating conversation history on early/error paths; exports updated.
Pending message store & draining
src/services/sqlite/PendingMessageStore.ts, tests/services/sqlite/pending-message-drain.test.ts
Added drainSessionMessages(sessionDbId) to requeue or permanently fail messages based on retry counts; tests added to validate requeue, fail, and mixed scenarios.
Session lifecycle & routes
src/services/worker-service.ts, src/services/worker/session/SessionCompletionHandler.ts, src/services/worker/http/routes/SessionRoutes.ts
Replaced abandoned-message logic with drainSessionMessages usage and updated logging; status route now computes queueLength from pendingStore when session is missing.
Tests (response processor)
tests/worker/agents/response-processor.test.ts
Extended to cover classifiers, markFailed vs confirmProcessed behaviors, empty vs error outcomes depending on pending messages, and that conversation history is not updated on early/error/rate-limit paths; added unit tests for new classifiers.
CLI / scripts / SDK / utils
src/cli/handlers/summarize.ts, scripts/inspect-failed.sh, scripts/requeue-failed.sh, src/sdk/parser.ts, src/sdk/prompts.ts, src/utils/tag-stripping.ts
Minor type tweak in summarize handler; added inspection and requeue helper scripts; parseObservations skips empty observations; summary prompt clarified to forbid <observation> output; added <task-notification> stripping and counting.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent (Gemini / OpenRouter / SDK)
    participant RP as ResponseProcessor
    participant Store as PendingMessageStore
    participant Session as Session

    Agent->>RP: processAgentResponse(response, session,...)
    alt Empty/whitespace && processingMessageIds present
        RP->>Store: confirmProcessed or markFailed(processingMessageIds)
        RP-->>Agent: {status: 'skipped' or 'empty', observationCount:0, summaryStored:false}
    else Plain-text rate-limit/auth detected
        RP->>Store: markFailed(processingMessageIds)
        RP-->>Agent: {status: 'rate_limited'|'error', observationCount:0, summaryStored:false}
        Agent->>Session: abort/rollback or remove last conversation entry
    else Successful XML/observations parsed
        RP->>Store: storeObservations(parsedObservations)
        RP-->>Agent: {status: 'ok', observationCount:N, summaryStored:bool}
        Agent->>Session: commit conversation history and continue
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I sniffed the logs and hopped with glee,
Messages saved or marked where they should be,
Rate limits warned, rollbacks neat,
Processors tidy, no duplicate seat,
A rabbit cheers this orderly spree 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: preventing deletion of pending messages when LLM responses are non-XML or contain errors, which is the core fix implemented across all files in this PR.
Description check ✅ Passed The description comprehensively explains the problem (messages were silently deleted on error/non-XML responses), the root cause (production incident with auth errors), and the detailed solution with code changes, test coverage, and preserved safety properties.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/services/worker/OpenRouterAgent.ts (1)

203-223: ⚠️ Potential issue | 🟠 Major

Rollback retry-preserved prompts from session.conversationHistory.

The prompt is already appended at Lines 190-191 and 241-242 before the LLM call. When processAgentResponse() returns 'rate_limited' or 'error', that queue item is preserved for retry, so leaving the prompt in session.conversationHistory means the retry adds a second copy and every later OpenRouter request carries duplicated context. Pop the just-added prompt, or only commit it after a confirmed response.

💡 Minimal fix
           );
+          if (obsResult.status === 'rate_limited' || obsResult.status === 'error') {
+            session.conversationHistory.pop();
+          }
 
           if (obsResult.status === 'rate_limited') {
             logger.warn('SDK', 'OpenRouter rate-limited during observation, aborting session', {
               sessionId: session.sessionDbId
             });
             return;
           }
           );
+          if (summaryResult.status === 'rate_limited' || summaryResult.status === 'error') {
+            session.conversationHistory.pop();
+          }
 
           if (summaryResult.status === 'rate_limited') {
             logger.warn('SDK', 'OpenRouter rate-limited during summary, aborting session', {
               sessionId: session.sessionDbId
             });
             return;
           }

Also applies to: 254-274

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/OpenRouterAgent.ts` around lines 203 - 223, The code
appends the prompt to session.conversationHistory before calling
processAgentResponse, but when processAgentResponse returns 'rate_limited' or
'error' the item is preserved for retry and the pre-appended prompt becomes
duplicated on the next attempt; locate the prompt append in the OpenRouterAgent
method (the block around the LLM call where session.conversationHistory is
mutated at the earlier append lines) and either (A) only push/commit the prompt
after processAgentResponse returns a successful outcome, or (B) immediately
remove the just-added prompt when obsResult.status is 'rate_limited' or 'error'
(e.g., pop the last conversationHistory entry or remove the matching item) and
apply the same change for the other observation/response block referenced (the
region around lines 254-274) so retries do not accumulate duplicate context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 2509-2514: The repeated Markdown headings "## What's Changed" and
"## New Contributors" violate MD024 and lack the required blank line for MD022;
update those heading instances by making them unique (e.g., "## What's Changed —
vX.Y.Z" or "## New Contributors — vX.Y.Z") or convert them to non-heading text
(bold or plain) and ensure each heading is followed by a blank line; apply the
same change to other repeated blocks in the file to resolve both MD024 and
MD022.
- Around line 2241-2248: The markdown block around the Deleted/What remains
section is missing required blank lines, causing MD022/MD031; open CHANGELOG.md,
find the headings "### Deleted" and "### What remains", the fenced code block
that follows, and the "## Dependencies" heading, and insert a single blank line
before and after each of those headings and around the fenced code block so each
block is separated by blank lines (also apply the same fixes to the similar
block near the "2276-2277" area).

In `@src/services/worker/agents/ResponseProcessor.ts`:
- Around line 36-52: The regex in isRateLimitResponse is written as `/reset
s?/…/` which incorrectly requires a space between "reset" and the optional "s"
and therefore never matches phrases like "resets at 7pm"; update the pattern
used in isRateLimitResponse to match either "reset" or "resets" without a
spurious space (for example use a pattern equivalent to "resets?" followed by
whitespace and either "at <digit>" or "in <digit>" with case-insensitive
matching and optional word boundaries) so that strings like "resets at 7pm" and
"resets in 2 hours" are detected as rate-limited.
- Around line 160-196: The guard currently preserves only plain non-XML text,
allowing truncated/malformed XML (e.g. "<observation><type>...") to fall through
when observations.length === 0 and summary is falsy; update the logic in
ResponseProcessor (around variables text, observations, summary, agentName) to
also preserve messages whenever the parser produced no observations/summary even
if raw text contains XML markers: call preserveMessages(session, sessionManager,
worker, 'malformed_xml') and return error/rate_limited status as appropriate
(also reuse isRateLimitResponse/isAuthErrorResponse checks and logger calls), so
unparsable/truncated XML responses are never silently discarded before
storeObservations() / confirmProcessed().

---

Outside diff comments:
In `@src/services/worker/OpenRouterAgent.ts`:
- Around line 203-223: The code appends the prompt to
session.conversationHistory before calling processAgentResponse, but when
processAgentResponse returns 'rate_limited' or 'error' the item is preserved for
retry and the pre-appended prompt becomes duplicated on the next attempt; locate
the prompt append in the OpenRouterAgent method (the block around the LLM call
where session.conversationHistory is mutated at the earlier append lines) and
either (A) only push/commit the prompt after processAgentResponse returns a
successful outcome, or (B) immediately remove the just-added prompt when
obsResult.status is 'rate_limited' or 'error' (e.g., pop the last
conversationHistory entry or remove the matching item) and apply the same change
for the other observation/response block referenced (the region around lines
254-274) so retries do not accumulate duplicate context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6cdeeb6-227d-4d5c-aba6-31d45418d8a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc66ad and 5a17682.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • src/services/worker/GeminiAgent.ts
  • src/services/worker/OpenRouterAgent.ts
  • src/services/worker/SDKAgent.ts
  • src/services/worker/agents/ResponseProcessor.ts
  • src/services/worker/agents/index.ts
  • src/services/worker/agents/types.ts
  • tests/worker/agents/response-processor.test.ts

Comment thread CHANGELOG.md Outdated
Comment thread CHANGELOG.md Outdated
Comment thread src/services/worker/agents/ResponseProcessor.ts
Comment thread src/services/worker/agents/ResponseProcessor.ts Outdated
gchaix added a commit to gchaix/claude-mem that referenced this pull request Apr 7, 2026
- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space
  before the optional 's', so 'resets at 7pm' never matched; corrected
  to /\bresets?\s+(at\s+\d|in\s+\d)/i
- Preserve messages on malformed/truncated XML responses: the previous
  guard only fired when text had no XML markers; truncated XML like
  '<observation><type>bugfix' would pass the guard and fall through to
  storeObservations()+confirmProcessed(), recreating the message-loss
  bug; now also fires when XML markers are present but parser yields
  nothing (unless it's an intentional <skip_summary>-only response)
- Roll back OpenRouter conversationHistory on rate_limited or error:
  the prompt is pushed before the LLM call; on failure the session
  object is reused on the next retry claim, causing duplicate prompts
  to accumulate; pop the last entry before returning on error/rate_limited
- Add test for malformed/truncated XML preservation (25 tests, all pass)
@gchaix
Copy link
Copy Markdown
Author

gchaix commented Apr 7, 2026

Fixed in cdda789 (re: the outside-diff comment on OpenRouterAgent.ts lines 203-223 and 254-274).

The issue was real: session.conversationHistory is persisted on the in-memory ActiveSession object which is reused across retry claims (SessionManager.initializeSession returns the cached session if it already exists). So a prompt pushed before a failed LLM call would accumulate on subsequent retries.

Fix: added a pop() of the last conversationHistory entry immediately after processAgentResponse returns rate_limited or error, for both the observation and summary call sites.

gchaix added a commit to gchaix/claude-mem that referenced this pull request Apr 8, 2026
- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space
  before the optional 's', so 'resets at 7pm' never matched; corrected
  to /\bresets?\s+(at\s+\d|in\s+\d)/i
- Preserve messages on malformed/truncated XML responses: the previous
  guard only fired when text had no XML markers; truncated XML like
  '<observation><type>bugfix' would pass the guard and fall through to
  storeObservations()+confirmProcessed(), recreating the message-loss
  bug; now also fires when XML markers are present but parser yields
  nothing (unless it's an intentional <skip_summary>-only response)
- Roll back OpenRouter conversationHistory on rate_limited or error:
  the prompt is pushed before the LLM call; on failure the session
  object is reused on the next retry claim, causing duplicate prompts
  to accumulate; pop the last entry before returning on error/rate_limited
- Add test for malformed/truncated XML preservation (25 tests, all pass)
@gchaix gchaix force-pushed the fix/preserve-failed-observations branch from cdda789 to d8a24cb Compare April 8, 2026 16:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/services/worker/SDKAgent.ts (1)

264-284: ⚠️ Potential issue | 🟠 Major

Missing 'error' status handling and conversation history rollback.

SDKAgent only handles 'rate_limited' status but ignores 'error' status. Per ResponseProcessor.ts (context snippet 1), processAgentResponse returns { status: 'error' } for:

  • Empty responses with pending messages (line 153)
  • Auth errors (line 194)
  • Malformed/truncated XML (line 203)

Additionally, OpenRouterAgent (context snippet 2, lines 218-226) handles both statuses and rolls back the conversation history:

if (obsResult.status === 'rate_limited' || obsResult.status === 'error') {
  session.conversationHistory.pop();
}

Without this handling:

  1. Auth errors and malformed XML responses are silently ignored, allowing the loop to continue with corrupted state
  2. The pushed assistant response accumulates in conversationHistory on retries, causing context drift
Suggested fix (align with OpenRouterAgent pattern)
           // Parse and process response using shared ResponseProcessor
           const result = await processAgentResponse(
             textContent,
             session,
             this.dbManager,
             this.sessionManager,
             worker,
             discoveryTokens,
             originalTimestamp,
             'SDK',
             cwdTracker.lastCwd,
             modelId
           );

+          // Roll back conversation history on error/rate-limit to prevent accumulation
+          if (result.status === 'rate_limited' || result.status === 'error') {
+            session.conversationHistory.pop();
+          }
+
           // Rate-limited: abort — subsequent calls will also fail
           if (result.status === 'rate_limited') {
             logger.warn('SDK', 'Aborting session due to rate limiting', {
               sessionDbId: session.sessionDbId
             });
             session.abortController.abort();
             return;
           }
+
+          // Error: abort — auth errors and malformed responses indicate non-recoverable state
+          if (result.status === 'error') {
+            logger.warn('SDK', 'Aborting session due to error response', {
+              sessionDbId: session.sessionDbId
+            });
+            session.abortController.abort();
+            return;
+          }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/SDKAgent.ts` around lines 264 - 284, process the 'error'
status returned by processAgentResponse in SDKAgent the same way OpenRouterAgent
does: after calling processAgentResponse (in SDKAgent.ts) check if result.status
=== 'rate_limited' || result.status === 'error' and if so pop the last message
from session.conversationHistory; if result.status === 'rate_limited'
additionally log the warning and abort via session.abortController.abort() and
return, whereas for 'error' just rollback the conversationHistory and return so
retries don't accumulate corrupted assistant messages.
src/services/worker/agents/ResponseProcessor.ts (1)

139-154: ⚠️ Potential issue | 🟠 Major

Conversation history entry is not rolled back on error paths.

Line 141 pushes the assistant text to conversationHistory unconditionally (when text is non-empty). However, for the unproductive response guards (lines 179-203), the function returns early after the push but without popping the entry. This leaves error/rate-limit response text in the conversation history, which can cause accumulation on retries.

The guards at lines 147-153 are fine since empty responses don't push anything. But the non-empty error paths need rollback.

Suggested fix
     if (isRateLimitResponse(text)) {
       logger.warn('PARSER', `${agentName} returned rate-limit response; messages preserved for retry`, {
         sessionId: session.sessionDbId,
         preview
       });
+      // Roll back the assistant response we just pushed
+      session.conversationHistory.pop();
       preserveMessages(session, sessionManager, worker, 'rate_limit');
       return { status: 'rate_limited', observationCount: 0, summaryStored: false };
     }
 
     if (isAuthErrorResponse(text)) {
       logger.error('PARSER', `${agentName} returned auth error response; messages preserved for retry`, {
         sessionId: session.sessionDbId,
         preview
       });
+      // Roll back the assistant response we just pushed
+      session.conversationHistory.pop();
       preserveMessages(session, sessionManager, worker, 'auth_error');
       return { status: 'error', observationCount: 0, summaryStored: false };
     }
 
     const reason = hasXmlMarkers ? 'malformed_xml' : 'non_xml';
     logger.warn('PARSER', `${agentName} returned ${reason === 'malformed_xml' ? 'malformed/truncated XML' : 'non-XML'} response; messages preserved for retry`, {
       sessionId: session.sessionDbId,
       preview
     });
+    // Roll back the assistant response we just pushed
+    session.conversationHistory.pop();
     preserveMessages(session, sessionManager, worker, reason);
     return { status: 'error', observationCount: 0, summaryStored: false };

Alternatively, if the rollback is intended to be handled by callers (as mentioned in PR comments), then ResponseProcessor should document this contract clearly—but the context snippet 2 shows OpenRouterAgent doing the pop at the call site, which creates inconsistency with this centralized approach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/agents/ResponseProcessor.ts` around lines 139 - 154, The
code pushes assistant text into session.conversationHistory via
session.conversationHistory.push({ role: 'assistant', content: text }) before
several early-return error guards, which leaves erroneous/rate-limit responses
in history on retry; fix by ensuring any early-return paths (e.g., the
empty-response guard that calls preserveMessages(session, sessionManager,
worker, 'empty_response') and others that return { status: 'error', ... })
remove the last pushed entry (pop) or move the push until after all guards
succeed so conversationHistory is only mutated on final successful responses,
keeping the behavior consistent with callers like OpenRouterAgent that currently
pop on error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/services/worker/agents/ResponseProcessor.ts`:
- Around line 139-154: The code pushes assistant text into
session.conversationHistory via session.conversationHistory.push({ role:
'assistant', content: text }) before several early-return error guards, which
leaves erroneous/rate-limit responses in history on retry; fix by ensuring any
early-return paths (e.g., the empty-response guard that calls
preserveMessages(session, sessionManager, worker, 'empty_response') and others
that return { status: 'error', ... }) remove the last pushed entry (pop) or move
the push until after all guards succeed so conversationHistory is only mutated
on final successful responses, keeping the behavior consistent with callers like
OpenRouterAgent that currently pop on error.

In `@src/services/worker/SDKAgent.ts`:
- Around line 264-284: process the 'error' status returned by
processAgentResponse in SDKAgent the same way OpenRouterAgent does: after
calling processAgentResponse (in SDKAgent.ts) check if result.status ===
'rate_limited' || result.status === 'error' and if so pop the last message from
session.conversationHistory; if result.status === 'rate_limited' additionally
log the warning and abort via session.abortController.abort() and return,
whereas for 'error' just rollback the conversationHistory and return so retries
don't accumulate corrupted assistant messages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 285b78d1-52d1-4fb7-a267-b0b2b5985c8c

📥 Commits

Reviewing files that changed from the base of the PR and between cdda789 and d8a24cb.

📒 Files selected for processing (7)
  • src/services/worker/GeminiAgent.ts
  • src/services/worker/OpenRouterAgent.ts
  • src/services/worker/SDKAgent.ts
  • src/services/worker/agents/ResponseProcessor.ts
  • src/services/worker/agents/index.ts
  • src/services/worker/agents/types.ts
  • tests/worker/agents/response-processor.test.ts
✅ Files skipped from review due to trivial changes (2)
  • src/services/worker/agents/types.ts
  • src/services/worker/GeminiAgent.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/services/worker/agents/index.ts
  • src/services/worker/OpenRouterAgent.ts
  • tests/worker/agents/response-processor.test.ts

coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 8, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 8, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes Apr 8, 2026
gchaix added a commit to gchaix/claude-mem that referenced this pull request Apr 9, 2026
- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space
  before the optional 's', so 'resets at 7pm' never matched; corrected
  to /\bresets?\s+(at\s+\d|in\s+\d)/i
- Preserve messages on malformed/truncated XML responses: the previous
  guard only fired when text had no XML markers; truncated XML like
  '<observation><type>bugfix' would pass the guard and fall through to
  storeObservations()+confirmProcessed(), recreating the message-loss
  bug; now also fires when XML markers are present but parser yields
  nothing (unless it's an intentional <skip_summary>-only response)
- Roll back OpenRouter conversationHistory on rate_limited or error:
  the prompt is pushed before the LLM call; on failure the session
  object is reused on the next retry claim, causing duplicate prompts
  to accumulate; pop the last entry before returning on error/rate_limited
- Add test for malformed/truncated XML preservation (25 tests, all pass)
gchaix added 6 commits April 10, 2026 10:22
…onses

Previously, processAgentResponse() called confirmProcessed() (which DELETEs
pending_messages rows) regardless of whether the LLM actually produced
parseable XML. Any non-XML response — auth errors, rate limits, empty
responses, or garbled output — would silently delete the queued tool
interactions with zero observations stored.

This was observed in production when stale auth tokens caused the Claude SDK
to return '401 Invalid bearer token' as response text. Hours of observations
were permanently lost.

Changes:
- Add ProcessAgentResponseResult type (status: ok|rate_limited|error|empty)
  so callers can inspect what happened
- Add isRateLimitResponse() and isAuthErrorResponse() classification helpers
- Add preserveMessages() helper that calls markFailed() on all in-flight
  message IDs, routing them through the existing retry logic (up to 3
  attempts) rather than silently deleting them
- processAgentResponse() now returns early with markFailed() on all non-XML
  paths: empty response with pending messages, rate-limit text, auth error
  text, and any other unknown non-XML output
- All three agent call sites (SDKAgent, GeminiAgent, OpenRouterAgent) updated
  to capture the return value and abort the session on rate_limited to prevent
  wasting further LLM calls
- GeminiAgent's existing "leave for stale recovery" pattern (5-min wait)
  replaced by direct markFailed() calls for immediate retry
- Tests updated: non-XML test now asserts markFailed() is called and
  storeObservations() is NOT called; 7 new test cases cover rate-limit,
  auth error, empty-with-ids, empty-without-ids, and detection functions
- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space
  before the optional 's', so 'resets at 7pm' never matched; corrected
  to /\bresets?\s+(at\s+\d|in\s+\d)/i
- Preserve messages on malformed/truncated XML responses: the previous
  guard only fired when text had no XML markers; truncated XML like
  '<observation><type>bugfix' would pass the guard and fall through to
  storeObservations()+confirmProcessed(), recreating the message-loss
  bug; now also fires when XML markers are present but parser yields
  nothing (unless it's an intentional <skip_summary>-only response)
- Roll back OpenRouter conversationHistory on rate_limited or error:
  the prompt is pushed before the LLM call; on failure the session
  object is reused on the next retry claim, causing duplicate prompts
  to accumulate; pop the last entry before returning on error/rate_limited
- Add test for malformed/truncated XML preservation (25 tests, all pass)
Two related issues:

1. ResponseProcessor was pushing the assistant response to
   conversationHistory before the early-return guards. On any error path
   (empty response, rate limit, auth error, non-XML, malformed XML) the
   bad response text was left in history, so the next retry attempt would
   carry corrupted context.

   Fix: move the push to after all guards succeed. Callers that push a
   user prompt before calling processAgentResponse can then pop their
   push (one level) on non-ok status to keep history clean.

2. SDKAgent handled rate_limited by aborting but did not pop
   conversationHistory on either rate_limited or error. On error it also
   continued rather than returning, allowing subsequent loop iterations to
   run after a failed response.

   Fix: pop conversationHistory on both rate_limited and error (mirrors
   what OpenRouterAgent already does), then return immediately on error
   (matching the abort-and-return on rate_limited).

Adds three new tests asserting conversationHistory is empty after
empty-response, non-XML, and rate-limit error paths. 28 tests, all pass.
The `return` on `result.status === 'error'` added in d184b67 caused
the SDK subprocess to be killed on every empty response, triggering
crash-recovery loops that burned through the 3-restart limit.

This was observed in production: after a worker restart, INIT prompts
returned empty responses transiently. The `return` killed the generator,
crash-recovery restarted it, another empty response killed it again,
and after 3 cycles all pending messages were permanently stuck as failed.

Fix: remove the `return` on error so the generator loop continues to
the next message. The failed messages are already preserved via
markFailed() for retry. Only rate_limited warrants aborting, since
subsequent API calls will also fail.

Before this commit (d184b67): error → return → kill subprocess → restart
After this commit: error → continue loop → process next message
…race fix

- PendingMessageStore: add drainSessionMessages() — on session end, requeues
  messages with retries remaining instead of permanently failing them all.
  Deprecates markAllSessionMessagesAbandoned().

- worker-service + SessionCompletionHandler: migrate both drain call sites to
  drainSessionMessages(), logging { failed, requeued } counts.

- ResponseProcessor: treat empty LLM response as intentional skip (confirmed
  + deleted) instead of an error requiring retry. Adds 'skipped' to the
  ProcessAgentResponseResult status union.

- SessionRoutes: fix race condition in handleStatusByClaudeId — when session
  is not found in the active map, fall back to DB getPendingCount() instead of
  hardcoding queueLength: 0. This was causing the Stop hook to exit early
  before summarize messages were processed, explaining low summary counts.

- summarize.ts: add status?: string to queue-status type assertion to match
  the race fix above.

- tests: update response-processor tests for new skip semantics; add 8-test
  suite for drainSessionMessages() covering retries, exhaustion, isolation,
  and edge cases.

- scripts: add inspect-failed.sh and requeue-failed.sh for operational
  debugging of stuck pending_messages.
…empty observations

- tag-stripping: add TASK_NOTIFICATION_REGEX to strip <task-notification> blocks
  from user prompts. Claude Code injects these into the UserPromptSubmit hook
  when a background task completes at prompt submission time; they are system
  events, not user content, and were being persisted as prompt records.

- prompts.ts: strengthen summary mode-switch header with visual === banners,
  CRITICAL/STOP language, and a closing reminder. The observer subprocess was
  returning <observation> tags for summary requests due to established
  conversation context overriding a weak mode-switch instruction.

- parser.ts: skip observation shells with no title, narrative, or facts before
  storage. The LLM occasionally emits <observation><type>bugfix</type></observation>
  placeholders with no content, which appeared in the UI as blank "Untitled" cards.
  The "ALWAYS save observations" directive applies to observations with content.
@gchaix
Copy link
Copy Markdown
Author

gchaix commented Apr 10, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 10, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/sdk/parser.ts (1)

167-174: ⚠️ Potential issue | 🟡 Minor

Preserve notes-only summaries.

notes is a first-class field in ParsedSummary, but this false-positive guard ignores it. <summary><notes>...</notes></summary> now returns null even though the comment above says summaries should still be saved when fields are missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sdk/parser.ts` around lines 167 - 174, The false-positive guard in the
parser uses the condition if (!request && !investigated && !learned &&
!completed && !next_steps) which ignores the first-class ParsedSummary field
notes and causes summaries that only contain <notes> to be dropped; update the
guard to include notes (e.g., if (!request && !investigated && !learned &&
!completed && !next_steps && !notes) so summaries with notes are preserved,
leaving the logger.warn and return null behavior unchanged when truly empty.
src/services/worker/OpenRouterAgent.ts (1)

203-228: ⚠️ Potential issue | 🟠 Major

Roll back prompts when the OpenRouter call throws, not just on parsed error statuses.

These pops only run after processAgentResponse() returns. If queryOpenRouterMultiTurn() throws for 401/429/network/API errors, the prompt pushed at Line 190 or Line 246 stays in conversationHistory, and the next retry replays stale context. Add a local rollback path around each request as well.

Also applies to: 259-284

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/OpenRouterAgent.ts` around lines 203 - 228, The code only
rolls back the prompt after processAgentResponse() returns, so if
queryOpenRouterMultiTurn() throws (401/429/network/API errors) the pushed prompt
remains in session.conversationHistory; wrap each call to
queryOpenRouterMultiTurn() in a try/catch (both the observation call and the
later call referenced at the other site) and in the catch block immediately pop
the last entry from session.conversationHistory to undo the push, then log the
error and rethrow or return consistent with existing error handling (mirroring
the existing rollback/return logic used when obsResult.status indicates
rate_limited/error); reference queryOpenRouterMultiTurn, processAgentResponse,
and session.conversationHistory.pop for locating the changes.
src/services/worker/SDKAgent.ts (1)

264-297: ⚠️ Potential issue | 🟠 Major

The stale-prompt rollback still misses thrown auth failures.

This only pops when processAgentResponse() returns rate_limited or error. The explicit Invalid API key throw at Line 259 happens after the current user prompt has already been pushed in createMessageGenerator(), so retries on the same ActiveSession can still accumulate duplicate prompts. The rollback needs to run in the per-message exception path too.

src/services/worker/GeminiAgent.ts (1)

176-194: ⚠️ Potential issue | 🟠 Major

Rollback the failed turn from conversationHistory before returning.

These early returns leave the just-appended prompt/response in the cached ActiveSession, so the next retry replays duplicate context. That is the exact retry-history leak called out in the PR notes, and it still happens here for init/observation/summary failures.

♻️ Suggested fix
+      const initHistoryBase = session.conversationHistory.length;
       session.conversationHistory.push({ role: 'user', content: initPrompt });
       const initResponse = await this.queryGeminiMultiTurn(session.conversationHistory, apiKey, model, rateLimitingEnabled);

       if (initResponse.content) {
         session.conversationHistory.push({ role: 'assistant', content: initResponse.content });
         // ...
         const initResult = await processAgentResponse(/* ... */);

         if (initResult.status === 'rate_limited' || initResult.status === 'error') {
+          session.conversationHistory.splice(initHistoryBase);
           logger.warn('SDK', `Gemini init response failed (${initResult.status}), aborting session`, {
             sessionId: session.sessionDbId
           });
           return;
         }
       }

Apply the same historyBase/splice() pattern to the observation and summary turns before any non-ok early return.

Also applies to: 255-276, 307-328

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/GeminiAgent.ts` around lines 176 - 194, The early returns
after calling processAgentResponse (e.g., handling initResult.status ===
'rate_limited' || 'error') leave the just-appended turn in the
ActiveSession.conversationHistory; before any non-'ok' return you must rollback
that failed turn using the same historyBase/splice pattern: capture historyBase
prior to appending (or compute index where the new turn(s) were pushed) and call
conversationHistory.splice(historyBase) to remove the appended
init/observation/summary turn(s) so retries don’t replay duplicate context;
apply this fix in the init, observation, and summary handling blocks around
processAgentResponse in GeminiAgent (use the existing historyBase variable and
splice() pattern already used elsewhere).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/inspect-failed.sh`:
- Around line 71-84: The SQL uses unixepoch on millisecond timestamps and
references a non-existent column; update the datetime conversions to divide ms
by 1000 (e.g. datetime(created_at_epoch/1000, 'unixepoch', 'localtime')) and
replace the invalid failed_at_epoch reference with the actual schema column (or
a safe COALESCE fallback) in the SELECT for the pending_messages query inside
scripts/inspect-failed.sh; ensure both created_at_epoch and the
failure-timestamp column you choose are divided by 1000 and wrapped with
COALESCE(..., '—') so the while read loop (id, sess_id, msg_type, ...) can run
without errors.

In `@src/sdk/parser.ts`:
- Around line 53-62: The current emptiness guard uses hasContent = !!(title ||
narrative || facts.length > 0) which wrongly treats observations that only have
persisted fields like files_read or concept entries as empty; update the
hasContent check in src/sdk/parser.ts to also include all other persisted
observation fields (e.g., files_read/filesRead, concepts, references,
attachments or any other stored arrays/objects) using proper truthiness checks
(e.g., array.length > 0 or Object.keys(obj).length > 0) so those observations
are not skipped, and adjust the logger message near the hasContent check (and
any related logic in ResponseProcessor) to reflect which fields were considered
when deciding emptiness.

In `@src/services/sqlite/PendingMessageStore.ts`:
- Around line 297-303: The UPDATE uses a non-existent column failed_at_epoch on
the pending_messages table; change the UPDATE in failStmt (and any related uses)
to set the existing terminal timestamp column completed_at_epoch instead, i.e.
use completed_at_epoch = ? with the same now parameter, or if you intend a
separate failure timestamp add a DB migration to create failed_at_epoch and
update all readers/writers consistently (refer to failStmt, pending_messages,
failed_at_epoch, completed_at_epoch, and this.maxRetries to locate the code).

In `@src/services/worker/agents/ResponseProcessor.ts`:
- Around line 189-205: The auth-path in ResponseProcessor currently treats
isAuthErrorResponse(text) like a generic error; update it to return a distinct
fatal status (e.g., return { status: 'auth_error', observationCount: 0,
summaryStored: false }) instead of { status: 'error' ... } so callers can abort
retries; keep the preserveMessages(session, sessionManager, worker,
'auth_error') call and the existing log, and ensure any callers that switch on
returned status can handle the 'auth_error' value.
- Around line 139-155: The code currently treats any empty text as an
intentional skip and immediately confirms/clears session.processingMessageIds
(using pendingStore.confirmProcessed and cleanupProcessedMessages), which can
drop real queued work; change the guard in ResponseProcessor (the block that
checks if (!text.trim() && session.processingMessageIds.length > 0)) to only
treat an empty response as a skip when the observation includes an explicit skip
marker (e.g., observation.intentionalSkip or a new boolean returned by
OpenRouterAgent.queryOpenRouterMultiTurn), and otherwise do not call
pendingStore.confirmProcessed or clear session.processingMessageIds—leave the
messages for retry/inspection and return a non-skipped error status; update the
agent call (OpenRouterAgent.queryOpenRouterMultiTurn) to propagate this explicit
flag if necessary and adjust logging to reflect whether the empty response was
an explicit skip or an upstream failure.

---

Outside diff comments:
In `@src/sdk/parser.ts`:
- Around line 167-174: The false-positive guard in the parser uses the condition
if (!request && !investigated && !learned && !completed && !next_steps) which
ignores the first-class ParsedSummary field notes and causes summaries that only
contain <notes> to be dropped; update the guard to include notes (e.g., if
(!request && !investigated && !learned && !completed && !next_steps && !notes)
so summaries with notes are preserved, leaving the logger.warn and return null
behavior unchanged when truly empty.

In `@src/services/worker/GeminiAgent.ts`:
- Around line 176-194: The early returns after calling processAgentResponse
(e.g., handling initResult.status === 'rate_limited' || 'error') leave the
just-appended turn in the ActiveSession.conversationHistory; before any non-'ok'
return you must rollback that failed turn using the same historyBase/splice
pattern: capture historyBase prior to appending (or compute index where the new
turn(s) were pushed) and call conversationHistory.splice(historyBase) to remove
the appended init/observation/summary turn(s) so retries don’t replay duplicate
context; apply this fix in the init, observation, and summary handling blocks
around processAgentResponse in GeminiAgent (use the existing historyBase
variable and splice() pattern already used elsewhere).

In `@src/services/worker/OpenRouterAgent.ts`:
- Around line 203-228: The code only rolls back the prompt after
processAgentResponse() returns, so if queryOpenRouterMultiTurn() throws
(401/429/network/API errors) the pushed prompt remains in
session.conversationHistory; wrap each call to queryOpenRouterMultiTurn() in a
try/catch (both the observation call and the later call referenced at the other
site) and in the catch block immediately pop the last entry from
session.conversationHistory to undo the push, then log the error and rethrow or
return consistent with existing error handling (mirroring the existing
rollback/return logic used when obsResult.status indicates rate_limited/error);
reference queryOpenRouterMultiTurn, processAgentResponse, and
session.conversationHistory.pop for locating the changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5e764ab7-2e70-47e0-9e08-187ff4659353

📥 Commits

Reviewing files that changed from the base of the PR and between d184b67 and 6697aac.

📒 Files selected for processing (21)
  • plugin/scripts/mcp-server.cjs
  • plugin/scripts/worker-service.cjs
  • plugin/ui/viewer-bundle.js
  • scripts/inspect-failed.sh
  • scripts/requeue-failed.sh
  • src/cli/handlers/summarize.ts
  • src/sdk/parser.ts
  • src/sdk/prompts.ts
  • src/services/sqlite/PendingMessageStore.ts
  • src/services/worker-service.ts
  • src/services/worker/GeminiAgent.ts
  • src/services/worker/OpenRouterAgent.ts
  • src/services/worker/SDKAgent.ts
  • src/services/worker/agents/ResponseProcessor.ts
  • src/services/worker/agents/index.ts
  • src/services/worker/agents/types.ts
  • src/services/worker/http/routes/SessionRoutes.ts
  • src/services/worker/session/SessionCompletionHandler.ts
  • src/utils/tag-stripping.ts
  • tests/services/sqlite/pending-message-drain.test.ts
  • tests/worker/agents/response-processor.test.ts
✅ Files skipped from review due to trivial changes (1)
  • src/cli/handlers/summarize.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/services/worker/agents/index.ts
  • src/services/worker/agents/types.ts

Comment thread scripts/inspect-failed.sh
Comment on lines +71 to +84
sqlite3 -separator $'\t' "$DB" "
SELECT
id,
content_session_id,
message_type,
COALESCE(tool_name, '—'),
retry_count,
COALESCE(datetime(created_at_epoch, 'unixepoch', 'localtime'), '—'),
COALESCE(datetime(failed_at_epoch, 'unixepoch', 'localtime'), '—'),
COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '—')
FROM pending_messages
WHERE ${WHERE}
ORDER BY id;
" | while IFS=$'\t' read -r id sess_id msg_type tool retries created failed preview; do
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The inspection query is using the wrong timestamp fields and units.

pending_messages stores Date.now()-style millisecond epochs, so datetime(..., 'unixepoch') prints bogus dates unless you divide by 1000. On top of that, failed_at_epoch is not in the current schema, so this query will error before the script can display anything.

🛠️ Suggested fix
     COALESCE(tool_name, '—'),
     retry_count,
-    COALESCE(datetime(created_at_epoch, 'unixepoch', 'localtime'), '—'),
-    COALESCE(datetime(failed_at_epoch,   'unixepoch', 'localtime'), '—'),
+    COALESCE(datetime(created_at_epoch / 1000, 'unixepoch', 'localtime'), '—'),
+    COALESCE(datetime(completed_at_epoch / 1000, 'unixepoch', 'localtime'), '—'),
     COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '—')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sqlite3 -separator $'\t' "$DB" "
SELECT
id,
content_session_id,
message_type,
COALESCE(tool_name, ''),
retry_count,
COALESCE(datetime(created_at_epoch, 'unixepoch', 'localtime'), ''),
COALESCE(datetime(failed_at_epoch, 'unixepoch', 'localtime'), ''),
COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '')
FROM pending_messages
WHERE ${WHERE}
ORDER BY id;
" | while IFS=$'\t' read -r id sess_id msg_type tool retries created failed preview; do
sqlite3 -separator $'\t' "$DB" "
SELECT
id,
content_session_id,
message_type,
COALESCE(tool_name, ''),
retry_count,
COALESCE(datetime(created_at_epoch / 1000, 'unixepoch', 'localtime'), ''),
COALESCE(datetime(completed_at_epoch / 1000, 'unixepoch', 'localtime'), ''),
COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '')
FROM pending_messages
WHERE ${WHERE}
ORDER BY id;
" | while IFS=$'\t' read -r id sess_id msg_type tool retries created failed preview; do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/inspect-failed.sh` around lines 71 - 84, The SQL uses unixepoch on
millisecond timestamps and references a non-existent column; update the datetime
conversions to divide ms by 1000 (e.g. datetime(created_at_epoch/1000,
'unixepoch', 'localtime')) and replace the invalid failed_at_epoch reference
with the actual schema column (or a safe COALESCE fallback) in the SELECT for
the pending_messages query inside scripts/inspect-failed.sh; ensure both
created_at_epoch and the failure-timestamp column you choose are divided by 1000
and wrapped with COALESCE(..., '—') so the while read loop (id, sess_id,
msg_type, ...) can run without errors.

Comment thread src/sdk/parser.ts
Comment on lines +53 to +62
// Guard: skip observations with no meaningful content (title, narrative, facts all empty).
// These are LLM-generated shells — e.g. <observation><type>bugfix</type></observation> —
// that provide no value and pollute the UI with blank "Untitled" cards.
// NOTE: "ALWAYS save observations" (10/24/2025) applies to observations with content;
// it was never intended to preserve completely empty shells.
const hasContent = !!(title || narrative || facts.length > 0);
if (!hasContent) {
logger.warn('PARSER', 'Skipping empty observation shell (no title, narrative, or facts)', { correlationId, type });
continue;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't treat file-only or concept-only observations as empty.

This guard only counts title, narrative, and facts as meaningful content. A structured response like <observation><type>bugfix</type><files_read>...</files_read></observation> now gets dropped here, and ResponseProcessor will classify the whole response as malformed/empty instead of storing it or treating it as a valid observation. Include the other persisted fields in the emptiness check too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sdk/parser.ts` around lines 53 - 62, The current emptiness guard uses
hasContent = !!(title || narrative || facts.length > 0) which wrongly treats
observations that only have persisted fields like files_read or concept entries
as empty; update the hasContent check in src/sdk/parser.ts to also include all
other persisted observation fields (e.g., files_read/filesRead, concepts,
references, attachments or any other stored arrays/objects) using proper
truthiness checks (e.g., array.length > 0 or Object.keys(obj).length > 0) so
those observations are not skipped, and adjust the logger message near the
hasContent check (and any related logic in ResponseProcessor) to reflect which
fields were considered when deciding emptiness.

Comment on lines +297 to +303
const failStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'failed', failed_at_epoch = ?
WHERE session_db_id = ? AND status IN ('pending', 'processing')
AND retry_count >= ?
`);
const failResult = failStmt.run(now, sessionDbId, this.maxRetries);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use the schema’s terminal timestamp column here.

pending_messages does not have a failed_at_epoch column in the current schema; it only has completed_at_epoch. This update will fail as soon as a drained message has exhausted retries, and the new test/script changes are already assuming the same nonexistent column downstream.

🛠️ Suggested fix
     const failStmt = this.db.prepare(`
       UPDATE pending_messages
-      SET status = 'failed', failed_at_epoch = ?
+      SET status = 'failed', completed_at_epoch = ?
       WHERE session_db_id = ? AND status IN ('pending', 'processing')
         AND retry_count >= ?
     `);

If you want a dedicated failure timestamp, add a migration first and update all readers/writers consistently.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const failStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'failed', failed_at_epoch = ?
WHERE session_db_id = ? AND status IN ('pending', 'processing')
AND retry_count >= ?
`);
const failResult = failStmt.run(now, sessionDbId, this.maxRetries);
const failStmt = this.db.prepare(`
UPDATE pending_messages
SET status = 'failed', completed_at_epoch = ?
WHERE session_db_id = ? AND status IN ('pending', 'processing')
AND retry_count >= ?
`);
const failResult = failStmt.run(now, sessionDbId, this.maxRetries);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/sqlite/PendingMessageStore.ts` around lines 297 - 303, The
UPDATE uses a non-existent column failed_at_epoch on the pending_messages table;
change the UPDATE in failStmt (and any related uses) to set the existing
terminal timestamp column completed_at_epoch instead, i.e. use
completed_at_epoch = ? with the same now parameter, or if you intend a separate
failure timestamp add a DB migration to create failed_at_epoch and update all
readers/writers consistently (refer to failStmt, pending_messages,
failed_at_epoch, completed_at_epoch, and this.maxRetries to locate the code).

Comment on lines +139 to 155
// --- GUARD: Empty response with pending messages ---
// The observation prompt explicitly allows the LLM to return empty to skip trivial
// tool uses. Genuine API failures produce non-empty error text (rate-limit notices,
// auth errors, HTTP bodies), so an empty string here is an intentional skip, not a
// transient error. Confirm and delete the messages rather than retrying them.
if (!text.trim() && session.processingMessageIds.length > 0) {
const pendingStore = sessionManager.getPendingMessageStore();
for (const messageId of session.processingMessageIds) {
pendingStore.confirmProcessed(messageId);
}
logger.info('QUEUE', `SKIPPED | sessionDbId=${session.sessionDbId} | reason=intentional_skip | count=${session.processingMessageIds.length} | ids=[${session.processingMessageIds.join(',')}]`, {
sessionId: session.sessionDbId
});
session.processingMessageIds = [];
cleanupProcessedMessages(session, worker);
return { status: 'skipped', observationCount: 0, summaryStored: false };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Empty responses can still drop queued messages.

This branch confirms and clears processingMessageIds before the caller can distinguish an intentional skip from an upstream failure. OpenRouterAgent.queryOpenRouterMultiTurn() already normalizes one API failure mode to content: '', so this path can still delete real queued work on empty provider responses. If empty strings need to mean “skip”, that signal should be explicit rather than inferred from !text.trim().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/agents/ResponseProcessor.ts` around lines 139 - 155, The
code currently treats any empty text as an intentional skip and immediately
confirms/clears session.processingMessageIds (using
pendingStore.confirmProcessed and cleanupProcessedMessages), which can drop real
queued work; change the guard in ResponseProcessor (the block that checks if
(!text.trim() && session.processingMessageIds.length > 0)) to only treat an
empty response as a skip when the observation includes an explicit skip marker
(e.g., observation.intentionalSkip or a new boolean returned by
OpenRouterAgent.queryOpenRouterMultiTurn), and otherwise do not call
pendingStore.confirmProcessed or clear session.processingMessageIds—leave the
messages for retry/inspection and return a non-skipped error status; update the
agent call (OpenRouterAgent.queryOpenRouterMultiTurn) to propagate this explicit
flag if necessary and adjust logging to reflect whether the empty response was
an explicit skip or an upstream failure.

Comment on lines +189 to +205
if (isAuthErrorResponse(text)) {
logger.error('PARSER', `${agentName} returned auth error response; messages preserved for retry`, {
sessionId: session.sessionDbId,
preview
});
preserveMessages(session, sessionManager, worker, 'auth_error');
return { status: 'error', observationCount: 0, summaryStored: false };
}

const reason = hasXmlMarkers ? 'malformed_xml' : 'non_xml';
logger.warn('PARSER', `${agentName} returned ${reason === 'malformed_xml' ? 'malformed/truncated XML' : 'non-XML'} response; messages preserved for retry`, {
sessionId: session.sessionDbId,
preview
});
preserveMessages(session, sessionManager, worker, reason);
return { status: 'error', observationCount: 0, summaryStored: false };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Auth failures need a distinct result status.

isAuthErrorResponse() identifies a deterministic credential/configuration failure, but this still returns status: 'error'. The current callers only abort on rate_limited, so an expired token can burn one retry for every queued message in the same session before anything stops. Returning a dedicated auth_error/fatal status would let agents abort immediately while keeping transient parse errors retryable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/agents/ResponseProcessor.ts` around lines 189 - 205, The
auth-path in ResponseProcessor currently treats isAuthErrorResponse(text) like a
generic error; update it to return a distinct fatal status (e.g., return {
status: 'auth_error', observationCount: 0, summaryStored: false }) instead of {
status: 'error' ... } so callers can abort retries; keep the
preserveMessages(session, sessionManager, worker, 'auth_error') call and the
existing log, and ensure any callers that switch on returned status can handle
the 'auth_error' value.

@thedotmack
Copy link
Copy Markdown
Owner

Closed during the April 2026 backlog cleanup. The underlying bug is now tracked in #1874, which is the single canonical issue being addressed by the maintainer. Thanks for taking the time to report — your symptoms and repro are captured in the consolidated ticket.

@thedotmack thedotmack closed this Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants