fix: preserve pending messages when LLM returns non-XML or error responses by gchaix · Pull Request #1643 · thedotmack/claude-mem

gchaix · 2026-04-07T22:10:10Z

processAgentResponse() called confirmProcessed() (which DELETEs the pending_messages row) regardless of whether the LLM actually returned parseable XML. Any unexpected response — auth errors, rate limits, empty responses, garbled output — silently deleted the queued tool interactions with zero observations stored.

This was observed in production when stale auth tokens caused the Claude SDK to return 401 Invalid bearer token as response text. The parser found no XML, the warning was logged, and then confirmProcessed() deleted the queue entry.

What Changed

ResponseProcessor.ts — the core fix:

Added isRateLimitResponse() and isAuthErrorResponse() classification helpers
Added preserveMessages() — calls markFailed() on all in-flight message IDs, routing them through the existing retry logic (up to 3 retries, then permanently failed but visible in the queue UI) rather than silently deleting them
processAgentResponse() now returns ProcessAgentResponseResult (status: ok | rate_limited | error | empty) instead of void
All non-XML paths now return early via preserveMessages():
- Empty response with pending messages → error
- Rate-limit text → rate_limited
- Auth error text → error
- Unknown non-XML → error

Agent call sites — all 6 processAgentResponse calls updated:

SDKAgent.ts — captures result, aborts session on rate_limited
GeminiAgent.ts — captures result at all 3 call sites; replaces the "leave for stale recovery" pattern (5-min wait) with immediate markFailed() calls
OpenRouterAgent.ts — captures result at both call sites, aborts on rate_limited

types.ts — adds ProcessAgentResponseResult interface

index.ts — exports ProcessAgentResponseResult, isRateLimitResponse, isAuthErrorResponse

Tests

24 tests pass (bun test tests/worker/agents/response-processor.test.ts).

Updated existing tests:

Non-XML response test now asserts markFailed() is called and storeObservations() is not called

New tests:

Rate-limit text → markFailed() called, status rate_limited
Auth error text → markFailed() called, status error
Empty response with pending messages → markFailed() called, status error
Empty response with no pending messages (init prompt) → storeObservations() called normally, status empty
Plain-text non-XML with pending messages → markFailed() called, status error
isRateLimitResponse() unit tests
isAuthErrorResponse() unit tests

Existing Safety Properties Preserved

confirmProcessed() is still called only after successful storeObservations() on the happy path
Init prompts (no pending messages) still call storeObservations() with 0 observations — no session timestamp side-effects
markFailed() retries up to 3× then marks permanently failed (already the behaviour for rate-limit errors in OpenRouter batch processing)
The startup self-heal (resetStaleProcessingMessages(0)) and the 5-minute stale claim reset are unchanged

coderabbitai · 2026-04-07T22:10:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Summary by CodeRabbit

Release Notes

New Features
- Added CLI tools to inspect, filter, delete, and requeue failed messages from the queue.
Bug Fixes
- Improved handling of rate-limited and authentication error responses with proper session cleanup.
- Empty observations are now skipped during processing.
- Added conversation history rollback on error and rate-limit responses.
Documentation
- Clarified summary prompt instructions regarding observation output handling.
Chores
- Enhanced tag stripping to remove task notification blocks from responses.

Walkthrough

Call sites now await and inspect a structured ProcessAgentResponseResult from processAgentResponse. The processor classifies plain-text rate-limit/auth errors, preserves or fails pending messages, and returns detailed statuses so agents can abort, rollback, or continue without committing assistant text on early/error paths.

Changes

Cohort / File(s)	Summary
Agent callers `src/services/worker/GeminiAgent.ts`, `src/services/worker/OpenRouterAgent.ts`, `src/services/worker/SDKAgent.ts`	Callers now store `processAgentResponse` results and branch on `status`. Agents abort or return early on `rate_limited`, rollback recent conversation entries on `rate_limited`/`error`, and avoid committing assistant text on early/error outcomes.
Response processing core `src/services/worker/agents/ResponseProcessor.ts`, `src/services/worker/agents/types.ts`, `src/services/worker/agents/index.ts`	`processAgentResponse` returns `ProcessAgentResponseResult` (added type), introduces `isRateLimitResponse`/`isAuthErrorResponse`, implements `preserveMessages`/mark-failed behavior, distinguishes `skipped`/`empty`/`error`/`rate_limited`/`ok`, and avoids updating conversation history on early/error paths; exports updated.
Pending message store & draining `src/services/sqlite/PendingMessageStore.ts`, `tests/services/sqlite/pending-message-drain.test.ts`	Added `drainSessionMessages(sessionDbId)` to requeue or permanently fail messages based on retry counts; tests added to validate requeue, fail, and mixed scenarios.
Session lifecycle & routes `src/services/worker-service.ts`, `src/services/worker/session/SessionCompletionHandler.ts`, `src/services/worker/http/routes/SessionRoutes.ts`	Replaced abandoned-message logic with `drainSessionMessages` usage and updated logging; status route now computes `queueLength` from `pendingStore` when session is missing.
Tests (response processor) `tests/worker/agents/response-processor.test.ts`	Extended to cover classifiers, `markFailed` vs `confirmProcessed` behaviors, empty vs error outcomes depending on pending messages, and that conversation history is not updated on early/error/rate-limit paths; added unit tests for new classifiers.
CLI / scripts / SDK / utils `src/cli/handlers/summarize.ts`, `scripts/inspect-failed.sh`, `scripts/requeue-failed.sh`, `src/sdk/parser.ts`, `src/sdk/prompts.ts`, `src/utils/tag-stripping.ts`	Minor type tweak in summarize handler; added inspection and requeue helper scripts; `parseObservations` skips empty observations; summary prompt clarified to forbid `<observation>` output; added `<task-notification>` stripping and counting.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent (Gemini / OpenRouter / SDK)
    participant RP as ResponseProcessor
    participant Store as PendingMessageStore
    participant Session as Session

    Agent->>RP: processAgentResponse(response, session,...)
    alt Empty/whitespace && processingMessageIds present
        RP->>Store: confirmProcessed or markFailed(processingMessageIds)
        RP-->>Agent: {status: 'skipped' or 'empty', observationCount:0, summaryStored:false}
    else Plain-text rate-limit/auth detected
        RP->>Store: markFailed(processingMessageIds)
        RP-->>Agent: {status: 'rate_limited'|'error', observationCount:0, summaryStored:false}
        Agent->>Session: abort/rollback or remove last conversation entry
    else Successful XML/observations parsed
        RP->>Store: storeObservations(parsedObservations)
        RP-->>Agent: {status: 'ok', observationCount:N, summaryStored:bool}
        Agent->>Session: commit conversation history and continue
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: isolate Claude and Codex session sources #1472 — touches ResponseProcessor and related agent response handling; likely overlaps type/behavior changes.
fix: worker startup crash, missing migration, and merge artifacts #1641 — modifies agent response processing and caller handling; strong code-level connection.
feat: drain orphaned pending messages on SIGTERM session completion #1567 — changes session-completion pending-message handling (markAllSessionMessagesAbandoned vs drain); related to drain semantics.

Poem

🐰 I sniffed the logs and hopped with glee,
Messages saved or marked where they should be,
Rate limits warned, rollbacks neat,
Processors tidy, no duplicate seat,
A rabbit cheers this orderly spree 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: preventing deletion of pending messages when LLM responses are non-XML or contain errors, which is the core fix implemented across all files in this PR.
Description check	✅ Passed	The description comprehensively explains the problem (messages were silently deleted on error/non-XML responses), the root cause (production incident with auth errors), and the detailed solution with code changes, test coverage, and preserved safety properties.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/services/worker/OpenRouterAgent.ts (1)

203-223: ⚠️ Potential issue | 🟠 Major

Rollback retry-preserved prompts from session.conversationHistory.

The prompt is already appended at Lines 190-191 and 241-242 before the LLM call. When processAgentResponse() returns 'rate_limited' or 'error', that queue item is preserved for retry, so leaving the prompt in session.conversationHistory means the retry adds a second copy and every later OpenRouter request carries duplicated context. Pop the just-added prompt, or only commit it after a confirmed response.

💡 Minimal fix

           );
+          if (obsResult.status === 'rate_limited' || obsResult.status === 'error') {
+            session.conversationHistory.pop();
+          }
 
           if (obsResult.status === 'rate_limited') {
             logger.warn('SDK', 'OpenRouter rate-limited during observation, aborting session', {
               sessionId: session.sessionDbId
             });
             return;
           }

           );
+          if (summaryResult.status === 'rate_limited' || summaryResult.status === 'error') {
+            session.conversationHistory.pop();
+          }
 
           if (summaryResult.status === 'rate_limited') {
             logger.warn('SDK', 'OpenRouter rate-limited during summary, aborting session', {
               sessionId: session.sessionDbId
             });
             return;
           }

Also applies to: 254-274

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/OpenRouterAgent.ts` around lines 203 - 223, The code
appends the prompt to session.conversationHistory before calling
processAgentResponse, but when processAgentResponse returns 'rate_limited' or
'error' the item is preserved for retry and the pre-appended prompt becomes
duplicated on the next attempt; locate the prompt append in the OpenRouterAgent
method (the block around the LLM call where session.conversationHistory is
mutated at the earlier append lines) and either (A) only push/commit the prompt
after processAgentResponse returns a successful outcome, or (B) immediately
remove the just-added prompt when obsResult.status is 'rate_limited' or 'error'
(e.g., pop the last conversationHistory entry or remove the matching item) and
apply the same change for the other observation/response block referenced (the
region around lines 254-274) so retries do not accumulate duplicate context.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 2509-2514: The repeated Markdown headings "## What's Changed" and
"## New Contributors" violate MD024 and lack the required blank line for MD022;
update those heading instances by making them unique (e.g., "## What's Changed —
vX.Y.Z" or "## New Contributors — vX.Y.Z") or convert them to non-heading text
(bold or plain) and ensure each heading is followed by a blank line; apply the
same change to other repeated blocks in the file to resolve both MD024 and
MD022.
- Around line 2241-2248: The markdown block around the Deleted/What remains
section is missing required blank lines, causing MD022/MD031; open CHANGELOG.md,
find the headings "### Deleted" and "### What remains", the fenced code block
that follows, and the "## Dependencies" heading, and insert a single blank line
before and after each of those headings and around the fenced code block so each
block is separated by blank lines (also apply the same fixes to the similar
block near the "2276-2277" area).

In `@src/services/worker/agents/ResponseProcessor.ts`:
- Around line 36-52: The regex in isRateLimitResponse is written as `/reset
s?/…/` which incorrectly requires a space between "reset" and the optional "s"
and therefore never matches phrases like "resets at 7pm"; update the pattern
used in isRateLimitResponse to match either "reset" or "resets" without a
spurious space (for example use a pattern equivalent to "resets?" followed by
whitespace and either "at <digit>" or "in <digit>" with case-insensitive
matching and optional word boundaries) so that strings like "resets at 7pm" and
"resets in 2 hours" are detected as rate-limited.
- Around line 160-196: The guard currently preserves only plain non-XML text,
allowing truncated/malformed XML (e.g. "<observation><type>...") to fall through
when observations.length === 0 and summary is falsy; update the logic in
ResponseProcessor (around variables text, observations, summary, agentName) to
also preserve messages whenever the parser produced no observations/summary even
if raw text contains XML markers: call preserveMessages(session, sessionManager,
worker, 'malformed_xml') and return error/rate_limited status as appropriate
(also reuse isRateLimitResponse/isAuthErrorResponse checks and logger calls), so
unparsable/truncated XML responses are never silently discarded before
storeObservations() / confirmProcessed().

---

Outside diff comments:
In `@src/services/worker/OpenRouterAgent.ts`:
- Around line 203-223: The code appends the prompt to
session.conversationHistory before calling processAgentResponse, but when
processAgentResponse returns 'rate_limited' or 'error' the item is preserved for
retry and the pre-appended prompt becomes duplicated on the next attempt; locate
the prompt append in the OpenRouterAgent method (the block around the LLM call
where session.conversationHistory is mutated at the earlier append lines) and
either (A) only push/commit the prompt after processAgentResponse returns a
successful outcome, or (B) immediately remove the just-added prompt when
obsResult.status is 'rate_limited' or 'error' (e.g., pop the last
conversationHistory entry or remove the matching item) and apply the same change
for the other observation/response block referenced (the region around lines
254-274) so retries do not accumulate duplicate context.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6cdeeb6-227d-4d5c-aba6-31d45418d8a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1fc66ad and 5a17682.

📒 Files selected for processing (8)

CHANGELOG.md
src/services/worker/GeminiAgent.ts
src/services/worker/OpenRouterAgent.ts
src/services/worker/SDKAgent.ts
src/services/worker/agents/ResponseProcessor.ts
src/services/worker/agents/index.ts
src/services/worker/agents/types.ts
tests/worker/agents/response-processor.test.ts

- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space before the optional 's', so 'resets at 7pm' never matched; corrected to /\bresets?\s+(at\s+\d|in\s+\d)/i - Preserve messages on malformed/truncated XML responses: the previous guard only fired when text had no XML markers; truncated XML like '<observation><type>bugfix' would pass the guard and fall through to storeObservations()+confirmProcessed(), recreating the message-loss bug; now also fires when XML markers are present but parser yields nothing (unless it's an intentional <skip_summary>-only response) - Roll back OpenRouter conversationHistory on rate_limited or error: the prompt is pushed before the LLM call; on failure the session object is reused on the next retry claim, causing duplicate prompts to accumulate; pop the last entry before returning on error/rate_limited - Add test for malformed/truncated XML preservation (25 tests, all pass)

gchaix · 2026-04-07T22:57:33Z

Fixed in cdda789 (re: the outside-diff comment on OpenRouterAgent.ts lines 203-223 and 254-274).

The issue was real: session.conversationHistory is persisted on the in-memory ActiveSession object which is reused across retry claims (SessionManager.initializeSession returns the cached session if it already exists). So a prompt pushed before a failed LLM call would accumulate on subsequent retries.

Fix: added a pop() of the last conversationHistory entry immediately after processAgentResponse returns rate_limited or error, for both the observation and summary call sites.

- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space before the optional 's', so 'resets at 7pm' never matched; corrected to /\bresets?\s+(at\s+\d|in\s+\d)/i - Preserve messages on malformed/truncated XML responses: the previous guard only fired when text had no XML markers; truncated XML like '<observation><type>bugfix' would pass the guard and fall through to storeObservations()+confirmProcessed(), recreating the message-loss bug; now also fires when XML markers are present but parser yields nothing (unless it's an intentional <skip_summary>-only response) - Roll back OpenRouter conversationHistory on rate_limited or error: the prompt is pushed before the LLM call; on failure the session object is reused on the next retry claim, causing duplicate prompts to accumulate; pop the last entry before returning on error/rate_limited - Add test for malformed/truncated XML preservation (25 tests, all pass)

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/services/worker/SDKAgent.ts (1)

264-284: ⚠️ Potential issue | 🟠 Major

Missing 'error' status handling and conversation history rollback.

SDKAgent only handles 'rate_limited' status but ignores 'error' status. Per ResponseProcessor.ts (context snippet 1), processAgentResponse returns { status: 'error' } for:

Empty responses with pending messages (line 153)
Auth errors (line 194)
Malformed/truncated XML (line 203)

Additionally, OpenRouterAgent (context snippet 2, lines 218-226) handles both statuses and rolls back the conversation history:

if (obsResult.status === 'rate_limited' || obsResult.status === 'error') {
  session.conversationHistory.pop();
}

Without this handling:

Auth errors and malformed XML responses are silently ignored, allowing the loop to continue with corrupted state
The pushed assistant response accumulates in conversationHistory on retries, causing context drift

Suggested fix (align with OpenRouterAgent pattern)

           // Parse and process response using shared ResponseProcessor
           const result = await processAgentResponse(
             textContent,
             session,
             this.dbManager,
             this.sessionManager,
             worker,
             discoveryTokens,
             originalTimestamp,
             'SDK',
             cwdTracker.lastCwd,
             modelId
           );

+          // Roll back conversation history on error/rate-limit to prevent accumulation
+          if (result.status === 'rate_limited' || result.status === 'error') {
+            session.conversationHistory.pop();
+          }
+
           // Rate-limited: abort — subsequent calls will also fail
           if (result.status === 'rate_limited') {
             logger.warn('SDK', 'Aborting session due to rate limiting', {
               sessionDbId: session.sessionDbId
             });
             session.abortController.abort();
             return;
           }
+
+          // Error: abort — auth errors and malformed responses indicate non-recoverable state
+          if (result.status === 'error') {
+            logger.warn('SDK', 'Aborting session due to error response', {
+              sessionDbId: session.sessionDbId
+            });
+            session.abortController.abort();
+            return;
+          }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/SDKAgent.ts` around lines 264 - 284, process the 'error'
status returned by processAgentResponse in SDKAgent the same way OpenRouterAgent
does: after calling processAgentResponse (in SDKAgent.ts) check if result.status
=== 'rate_limited' || result.status === 'error' and if so pop the last message
from session.conversationHistory; if result.status === 'rate_limited'
additionally log the warning and abort via session.abortController.abort() and
return, whereas for 'error' just rollback the conversationHistory and return so
retries don't accumulate corrupted assistant messages.

src/services/worker/agents/ResponseProcessor.ts (1)

139-154: ⚠️ Potential issue | 🟠 Major

Conversation history entry is not rolled back on error paths.

Line 141 pushes the assistant text to conversationHistory unconditionally (when text is non-empty). However, for the unproductive response guards (lines 179-203), the function returns early after the push but without popping the entry. This leaves error/rate-limit response text in the conversation history, which can cause accumulation on retries.

The guards at lines 147-153 are fine since empty responses don't push anything. But the non-empty error paths need rollback.

Suggested fix

     if (isRateLimitResponse(text)) {
       logger.warn('PARSER', `${agentName} returned rate-limit response; messages preserved for retry`, {
         sessionId: session.sessionDbId,
         preview
       });
+      // Roll back the assistant response we just pushed
+      session.conversationHistory.pop();
       preserveMessages(session, sessionManager, worker, 'rate_limit');
       return { status: 'rate_limited', observationCount: 0, summaryStored: false };
     }
 
     if (isAuthErrorResponse(text)) {
       logger.error('PARSER', `${agentName} returned auth error response; messages preserved for retry`, {
         sessionId: session.sessionDbId,
         preview
       });
+      // Roll back the assistant response we just pushed
+      session.conversationHistory.pop();
       preserveMessages(session, sessionManager, worker, 'auth_error');
       return { status: 'error', observationCount: 0, summaryStored: false };
     }
 
     const reason = hasXmlMarkers ? 'malformed_xml' : 'non_xml';
     logger.warn('PARSER', `${agentName} returned ${reason === 'malformed_xml' ? 'malformed/truncated XML' : 'non-XML'} response; messages preserved for retry`, {
       sessionId: session.sessionDbId,
       preview
     });
+    // Roll back the assistant response we just pushed
+    session.conversationHistory.pop();
     preserveMessages(session, sessionManager, worker, reason);
     return { status: 'error', observationCount: 0, summaryStored: false };

Alternatively, if the rollback is intended to be handled by callers (as mentioned in PR comments), then ResponseProcessor should document this contract clearly—but the context snippet 2 shows OpenRouterAgent doing the pop at the call site, which creates inconsistency with this centralized approach.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/agents/ResponseProcessor.ts` around lines 139 - 154, The
code pushes assistant text into session.conversationHistory via
session.conversationHistory.push({ role: 'assistant', content: text }) before
several early-return error guards, which leaves erroneous/rate-limit responses
in history on retry; fix by ensuring any early-return paths (e.g., the
empty-response guard that calls preserveMessages(session, sessionManager,
worker, 'empty_response') and others that return { status: 'error', ... })
remove the last pushed entry (pop) or move the push until after all guards
succeed so conversationHistory is only mutated on final successful responses,
keeping the behavior consistent with callers like OpenRouterAgent that currently
pop on error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/services/worker/agents/ResponseProcessor.ts`:
- Around line 139-154: The code pushes assistant text into
session.conversationHistory via session.conversationHistory.push({ role:
'assistant', content: text }) before several early-return error guards, which
leaves erroneous/rate-limit responses in history on retry; fix by ensuring any
early-return paths (e.g., the empty-response guard that calls
preserveMessages(session, sessionManager, worker, 'empty_response') and others
that return { status: 'error', ... }) remove the last pushed entry (pop) or move
the push until after all guards succeed so conversationHistory is only mutated
on final successful responses, keeping the behavior consistent with callers like
OpenRouterAgent that currently pop on error.

In `@src/services/worker/SDKAgent.ts`:
- Around line 264-284: process the 'error' status returned by
processAgentResponse in SDKAgent the same way OpenRouterAgent does: after
calling processAgentResponse (in SDKAgent.ts) check if result.status ===
'rate_limited' || result.status === 'error' and if so pop the last message from
session.conversationHistory; if result.status === 'rate_limited' additionally
log the warning and abort via session.abortController.abort() and return,
whereas for 'error' just rollback the conversationHistory and return so retries
don't accumulate corrupted assistant messages.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 285b78d1-52d1-4fb7-a267-b0b2b5985c8c

📥 Commits

Reviewing files that changed from the base of the PR and between cdda789 and d8a24cb.

📒 Files selected for processing (7)

src/services/worker/GeminiAgent.ts
src/services/worker/OpenRouterAgent.ts
src/services/worker/SDKAgent.ts
src/services/worker/agents/ResponseProcessor.ts
src/services/worker/agents/index.ts
src/services/worker/agents/types.ts
tests/worker/agents/response-processor.test.ts

✅ Files skipped from review due to trivial changes (2)

src/services/worker/agents/types.ts
src/services/worker/GeminiAgent.ts

🚧 Files skipped from review as they are similar to previous changes (3)

src/services/worker/agents/index.ts
src/services/worker/OpenRouterAgent.ts
tests/worker/agents/response-processor.test.ts

- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space before the optional 's', so 'resets at 7pm' never matched; corrected to /\bresets?\s+(at\s+\d|in\s+\d)/i - Preserve messages on malformed/truncated XML responses: the previous guard only fired when text had no XML markers; truncated XML like '<observation><type>bugfix' would pass the guard and fall through to storeObservations()+confirmProcessed(), recreating the message-loss bug; now also fires when XML markers are present but parser yields nothing (unless it's an intentional <skip_summary>-only response) - Roll back OpenRouter conversationHistory on rate_limited or error: the prompt is pushed before the LLM call; on failure the session object is reused on the next retry claim, causing duplicate prompts to accumulate; pop the last entry before returning on error/rate_limited - Add test for malformed/truncated XML preservation (25 tests, all pass)

…onses Previously, processAgentResponse() called confirmProcessed() (which DELETEs pending_messages rows) regardless of whether the LLM actually produced parseable XML. Any non-XML response — auth errors, rate limits, empty responses, or garbled output — would silently delete the queued tool interactions with zero observations stored. This was observed in production when stale auth tokens caused the Claude SDK to return '401 Invalid bearer token' as response text. Hours of observations were permanently lost. Changes: - Add ProcessAgentResponseResult type (status: ok|rate_limited|error|empty) so callers can inspect what happened - Add isRateLimitResponse() and isAuthErrorResponse() classification helpers - Add preserveMessages() helper that calls markFailed() on all in-flight message IDs, routing them through the existing retry logic (up to 3 attempts) rather than silently deleting them - processAgentResponse() now returns early with markFailed() on all non-XML paths: empty response with pending messages, rate-limit text, auth error text, and any other unknown non-XML output - All three agent call sites (SDKAgent, GeminiAgent, OpenRouterAgent) updated to capture the return value and abort the session on rate_limited to prevent wasting further LLM calls - GeminiAgent's existing "leave for stale recovery" pattern (5-min wait) replaced by direct markFailed() calls for immediate retry - Tests updated: non-XML test now asserts markFailed() is called and storeObservations() is NOT called; 7 new test cases cover rate-limit, auth error, empty-with-ids, empty-without-ids, and detection functions

- Fix regex bug in isRateLimitResponse: /reset s?/ had a spurious space before the optional 's', so 'resets at 7pm' never matched; corrected to /\bresets?\s+(at\s+\d|in\s+\d)/i - Preserve messages on malformed/truncated XML responses: the previous guard only fired when text had no XML markers; truncated XML like '<observation><type>bugfix' would pass the guard and fall through to storeObservations()+confirmProcessed(), recreating the message-loss bug; now also fires when XML markers are present but parser yields nothing (unless it's an intentional <skip_summary>-only response) - Roll back OpenRouter conversationHistory on rate_limited or error: the prompt is pushed before the LLM call; on failure the session object is reused on the next retry claim, causing duplicate prompts to accumulate; pop the last entry before returning on error/rate_limited - Add test for malformed/truncated XML preservation (25 tests, all pass)

Two related issues: 1. ResponseProcessor was pushing the assistant response to conversationHistory before the early-return guards. On any error path (empty response, rate limit, auth error, non-XML, malformed XML) the bad response text was left in history, so the next retry attempt would carry corrupted context. Fix: move the push to after all guards succeed. Callers that push a user prompt before calling processAgentResponse can then pop their push (one level) on non-ok status to keep history clean. 2. SDKAgent handled rate_limited by aborting but did not pop conversationHistory on either rate_limited or error. On error it also continued rather than returning, allowing subsequent loop iterations to run after a failed response. Fix: pop conversationHistory on both rate_limited and error (mirrors what OpenRouterAgent already does), then return immediately on error (matching the abort-and-return on rate_limited). Adds three new tests asserting conversationHistory is empty after empty-response, non-XML, and rate-limit error paths. 28 tests, all pass.

The `return` on `result.status === 'error'` added in d184b67 caused the SDK subprocess to be killed on every empty response, triggering crash-recovery loops that burned through the 3-restart limit. This was observed in production: after a worker restart, INIT prompts returned empty responses transiently. The `return` killed the generator, crash-recovery restarted it, another empty response killed it again, and after 3 cycles all pending messages were permanently stuck as failed. Fix: remove the `return` on error so the generator loop continues to the next message. The failed messages are already preserved via markFailed() for retry. Only rate_limited warrants aborting, since subsequent API calls will also fail. Before this commit (d184b67): error → return → kill subprocess → restart After this commit: error → continue loop → process next message

…race fix - PendingMessageStore: add drainSessionMessages() — on session end, requeues messages with retries remaining instead of permanently failing them all. Deprecates markAllSessionMessagesAbandoned(). - worker-service + SessionCompletionHandler: migrate both drain call sites to drainSessionMessages(), logging { failed, requeued } counts. - ResponseProcessor: treat empty LLM response as intentional skip (confirmed + deleted) instead of an error requiring retry. Adds 'skipped' to the ProcessAgentResponseResult status union. - SessionRoutes: fix race condition in handleStatusByClaudeId — when session is not found in the active map, fall back to DB getPendingCount() instead of hardcoding queueLength: 0. This was causing the Stop hook to exit early before summarize messages were processed, explaining low summary counts. - summarize.ts: add status?: string to queue-status type assertion to match the race fix above. - tests: update response-processor tests for new skip semantics; add 8-test suite for drainSessionMessages() covering retries, exhaustion, isolation, and edge cases. - scripts: add inspect-failed.sh and requeue-failed.sh for operational debugging of stuck pending_messages.

…empty observations - tag-stripping: add TASK_NOTIFICATION_REGEX to strip <task-notification> blocks from user prompts. Claude Code injects these into the UserPromptSubmit hook when a background task completes at prompt submission time; they are system events, not user content, and were being persisted as prompt records. - prompts.ts: strengthen summary mode-switch header with visual === banners, CRITICAL/STOP language, and a closing reminder. The observer subprocess was returning <observation> tags for summary requests due to established conversation context overriding a weak mode-switch instruction. - parser.ts: skip observation shells with no title, narrative, or facts before storage. The LLM occasionally emits <observation><type>bugfix</type></observation> placeholders with no content, which appeared in the UI as blank "Untitled" cards. The "ALWAYS save observations" directive applies to observations with content.

gchaix · 2026-04-10T17:30:14Z

@coderabbitai review

coderabbitai · 2026-04-10T17:30:20Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

src/sdk/parser.ts (1)
167-174: ⚠️ Potential issue | 🟡 Minor

Preserve notes-only summaries.

notes is a first-class field in ParsedSummary, but this false-positive guard ignores it. <summary><notes>...</notes></summary> now returns null even though the comment above says summaries should still be saved when fields are missing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sdk/parser.ts` around lines 167 - 174, The false-positive guard in the
parser uses the condition if (!request && !investigated && !learned &&
!completed && !next_steps) which ignores the first-class ParsedSummary field
notes and causes summaries that only contain <notes> to be dropped; update the
guard to include notes (e.g., if (!request && !investigated && !learned &&
!completed && !next_steps && !notes) so summaries with notes are preserved,
leaving the logger.warn and return null behavior unchanged when truly empty.
src/services/worker/OpenRouterAgent.ts (1)
203-228: ⚠️ Potential issue | 🟠 Major

Roll back prompts when the OpenRouter call throws, not just on parsed error statuses.

These pops only run after processAgentResponse() returns. If queryOpenRouterMultiTurn() throws for 401/429/network/API errors, the prompt pushed at Line 190 or Line 246 stays in conversationHistory, and the next retry replays stale context. Add a local rollback path around each request as well.

Also applies to: 259-284
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/OpenRouterAgent.ts` around lines 203 - 228, The code only
rolls back the prompt after processAgentResponse() returns, so if
queryOpenRouterMultiTurn() throws (401/429/network/API errors) the pushed prompt
remains in session.conversationHistory; wrap each call to
queryOpenRouterMultiTurn() in a try/catch (both the observation call and the
later call referenced at the other site) and in the catch block immediately pop
the last entry from session.conversationHistory to undo the push, then log the
error and rethrow or return consistent with existing error handling (mirroring
the existing rollback/return logic used when obsResult.status indicates
rate_limited/error); reference queryOpenRouterMultiTurn, processAgentResponse,
and session.conversationHistory.pop for locating the changes.
src/services/worker/SDKAgent.ts (1)

264-297: ⚠️ Potential issue | 🟠 Major

The stale-prompt rollback still misses thrown auth failures.

This only pops when processAgentResponse() returns rate_limited or error. The explicit Invalid API key throw at Line 259 happens after the current user prompt has already been pushed in createMessageGenerator(), so retries on the same ActiveSession can still accumulate duplicate prompts. The rollback needs to run in the per-message exception path too.
src/services/worker/GeminiAgent.ts (1)
176-194: ⚠️ Potential issue | 🟠 Major

Rollback the failed turn from conversationHistory before returning.

These early returns leave the just-appended prompt/response in the cached ActiveSession, so the next retry replays duplicate context. That is the exact retry-history leak called out in the PR notes, and it still happens here for init/observation/summary failures.
♻️ Suggested fix
+      const initHistoryBase = session.conversationHistory.length;
       session.conversationHistory.push({ role: 'user', content: initPrompt });
       const initResponse = await this.queryGeminiMultiTurn(session.conversationHistory, apiKey, model, rateLimitingEnabled);

       if (initResponse.content) {
         session.conversationHistory.push({ role: 'assistant', content: initResponse.content });
         // ...
         const initResult = await processAgentResponse(/* ... */);

         if (initResult.status === 'rate_limited' || initResult.status === 'error') {
+          session.conversationHistory.splice(initHistoryBase);
           logger.warn('SDK', `Gemini init response failed (${initResult.status}), aborting session`, {
             sessionId: session.sessionDbId
           });
           return;
         }
       }
Apply the same historyBase/splice() pattern to the observation and summary turns before any non-ok early return.
Also applies to: 255-276, 307-328
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/worker/GeminiAgent.ts` around lines 176 - 194, The early returns
after calling processAgentResponse (e.g., handling initResult.status ===
'rate_limited' || 'error') leave the just-appended turn in the
ActiveSession.conversationHistory; before any non-'ok' return you must rollback
that failed turn using the same historyBase/splice pattern: capture historyBase
prior to appending (or compute index where the new turn(s) were pushed) and call
conversationHistory.splice(historyBase) to remove the appended
init/observation/summary turn(s) so retries don’t replay duplicate context;
apply this fix in the init, observation, and summary handling blocks around
processAgentResponse in GeminiAgent (use the existing historyBase variable and
splice() pattern already used elsewhere).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/inspect-failed.sh`:
- Around line 71-84: The SQL uses unixepoch on millisecond timestamps and
references a non-existent column; update the datetime conversions to divide ms
by 1000 (e.g. datetime(created_at_epoch/1000, 'unixepoch', 'localtime')) and
replace the invalid failed_at_epoch reference with the actual schema column (or
a safe COALESCE fallback) in the SELECT for the pending_messages query inside
scripts/inspect-failed.sh; ensure both created_at_epoch and the
failure-timestamp column you choose are divided by 1000 and wrapped with
COALESCE(..., '—') so the while read loop (id, sess_id, msg_type, ...) can run
without errors.

In `@src/sdk/parser.ts`:
- Around line 53-62: The current emptiness guard uses hasContent = !!(title ||
narrative || facts.length > 0) which wrongly treats observations that only have
persisted fields like files_read or concept entries as empty; update the
hasContent check in src/sdk/parser.ts to also include all other persisted
observation fields (e.g., files_read/filesRead, concepts, references,
attachments or any other stored arrays/objects) using proper truthiness checks
(e.g., array.length > 0 or Object.keys(obj).length > 0) so those observations
are not skipped, and adjust the logger message near the hasContent check (and
any related logic in ResponseProcessor) to reflect which fields were considered
when deciding emptiness.

In `@src/services/sqlite/PendingMessageStore.ts`:
- Around line 297-303: The UPDATE uses a non-existent column failed_at_epoch on
the pending_messages table; change the UPDATE in failStmt (and any related uses)
to set the existing terminal timestamp column completed_at_epoch instead, i.e.
use completed_at_epoch = ? with the same now parameter, or if you intend a
separate failure timestamp add a DB migration to create failed_at_epoch and
update all readers/writers consistently (refer to failStmt, pending_messages,
failed_at_epoch, completed_at_epoch, and this.maxRetries to locate the code).

In `@src/services/worker/agents/ResponseProcessor.ts`:
- Around line 189-205: The auth-path in ResponseProcessor currently treats
isAuthErrorResponse(text) like a generic error; update it to return a distinct
fatal status (e.g., return { status: 'auth_error', observationCount: 0,
summaryStored: false }) instead of { status: 'error' ... } so callers can abort
retries; keep the preserveMessages(session, sessionManager, worker,
'auth_error') call and the existing log, and ensure any callers that switch on
returned status can handle the 'auth_error' value.
- Around line 139-155: The code currently treats any empty text as an
intentional skip and immediately confirms/clears session.processingMessageIds
(using pendingStore.confirmProcessed and cleanupProcessedMessages), which can
drop real queued work; change the guard in ResponseProcessor (the block that
checks if (!text.trim() && session.processingMessageIds.length > 0)) to only
treat an empty response as a skip when the observation includes an explicit skip
marker (e.g., observation.intentionalSkip or a new boolean returned by
OpenRouterAgent.queryOpenRouterMultiTurn), and otherwise do not call
pendingStore.confirmProcessed or clear session.processingMessageIds—leave the
messages for retry/inspection and return a non-skipped error status; update the
agent call (OpenRouterAgent.queryOpenRouterMultiTurn) to propagate this explicit
flag if necessary and adjust logging to reflect whether the empty response was
an explicit skip or an upstream failure.

---

Outside diff comments:
In `@src/sdk/parser.ts`:
- Around line 167-174: The false-positive guard in the parser uses the condition
if (!request && !investigated && !learned && !completed && !next_steps) which
ignores the first-class ParsedSummary field notes and causes summaries that only
contain <notes> to be dropped; update the guard to include notes (e.g., if
(!request && !investigated && !learned && !completed && !next_steps && !notes)
so summaries with notes are preserved, leaving the logger.warn and return null
behavior unchanged when truly empty.

In `@src/services/worker/GeminiAgent.ts`:
- Around line 176-194: The early returns after calling processAgentResponse
(e.g., handling initResult.status === 'rate_limited' || 'error') leave the
just-appended turn in the ActiveSession.conversationHistory; before any non-'ok'
return you must rollback that failed turn using the same historyBase/splice
pattern: capture historyBase prior to appending (or compute index where the new
turn(s) were pushed) and call conversationHistory.splice(historyBase) to remove
the appended init/observation/summary turn(s) so retries don’t replay duplicate
context; apply this fix in the init, observation, and summary handling blocks
around processAgentResponse in GeminiAgent (use the existing historyBase
variable and splice() pattern already used elsewhere).

In `@src/services/worker/OpenRouterAgent.ts`:
- Around line 203-228: The code only rolls back the prompt after
processAgentResponse() returns, so if queryOpenRouterMultiTurn() throws
(401/429/network/API errors) the pushed prompt remains in
session.conversationHistory; wrap each call to queryOpenRouterMultiTurn() in a
try/catch (both the observation call and the later call referenced at the other
site) and in the catch block immediately pop the last entry from
session.conversationHistory to undo the push, then log the error and rethrow or
return consistent with existing error handling (mirroring the existing
rollback/return logic used when obsResult.status indicates rate_limited/error);
reference queryOpenRouterMultiTurn, processAgentResponse, and
session.conversationHistory.pop for locating the changes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5e764ab7-2e70-47e0-9e08-187ff4659353

📥 Commits

Reviewing files that changed from the base of the PR and between d184b67 and 6697aac.

📒 Files selected for processing (21)

plugin/scripts/mcp-server.cjs
plugin/scripts/worker-service.cjs
plugin/ui/viewer-bundle.js
scripts/inspect-failed.sh
scripts/requeue-failed.sh
src/cli/handlers/summarize.ts
src/sdk/parser.ts
src/sdk/prompts.ts
src/services/sqlite/PendingMessageStore.ts
src/services/worker-service.ts
src/services/worker/GeminiAgent.ts
src/services/worker/OpenRouterAgent.ts
src/services/worker/SDKAgent.ts
src/services/worker/agents/ResponseProcessor.ts
src/services/worker/agents/index.ts
src/services/worker/agents/types.ts
src/services/worker/http/routes/SessionRoutes.ts
src/services/worker/session/SessionCompletionHandler.ts
src/utils/tag-stripping.ts
tests/services/sqlite/pending-message-drain.test.ts
tests/worker/agents/response-processor.test.ts

✅ Files skipped from review due to trivial changes (1)

src/cli/handlers/summarize.ts

🚧 Files skipped from review as they are similar to previous changes (2)

src/services/worker/agents/index.ts
src/services/worker/agents/types.ts

coderabbitai · 2026-04-10T17:37:15Z

+sqlite3 -separator $'\t' "$DB" "
+  SELECT
+    id,
+    content_session_id,
+    message_type,
+    COALESCE(tool_name, '—'),
+    retry_count,
+    COALESCE(datetime(created_at_epoch, 'unixepoch', 'localtime'), '—'),
+    COALESCE(datetime(failed_at_epoch,   'unixepoch', 'localtime'), '—'),
+    COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '—')
+  FROM pending_messages
+  WHERE ${WHERE}
+  ORDER BY id;
+" | while IFS=$'\t' read -r id sess_id msg_type tool retries created failed preview; do


⚠️ Potential issue | 🔴 Critical

The inspection query is using the wrong timestamp fields and units.

pending_messages stores Date.now()-style millisecond epochs, so datetime(..., 'unixepoch') prints bogus dates unless you divide by 1000. On top of that, failed_at_epoch is not in the current schema, so this query will error before the script can display anything.

🛠️ Suggested fix

COALESCE(tool_name, '—'), retry_count, - COALESCE(datetime(created_at_epoch, 'unixepoch', 'localtime'), '—'), - COALESCE(datetime(failed_at_epoch, 'unixepoch', 'localtime'), '—'), + COALESCE(datetime(created_at_epoch / 1000, 'unixepoch', 'localtime'), '—'), + COALESCE(datetime(completed_at_epoch / 1000, 'unixepoch', 'localtime'), '—'), COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '—')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sqlite3 -separator $'\t' "$DB" "

SELECT

id,

content_session_id,

message_type,

COALESCE(tool_name, '—'),

retry_count,

COALESCE(datetime(created_at_epoch, 'unixepoch', 'localtime'), '—'),

COALESCE(datetime(failed_at_epoch, 'unixepoch', 'localtime'), '—'),

COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '—')

FROM pending_messages

WHERE ${WHERE}

ORDER BY id;

" | while IFS=$'\t' read -r id sess_id msg_type tool retries created failed preview; do

sqlite3 -separator $'\t' "$DB" "

SELECT

id,

content_session_id,

message_type,

COALESCE(tool_name, '—'),

retry_count,

COALESCE(datetime(created_at_epoch / 1000, 'unixepoch', 'localtime'), '—'),

COALESCE(datetime(completed_at_epoch / 1000, 'unixepoch', 'localtime'), '—'),

COALESCE(substr(tool_response, 1, 120), substr(last_assistant_message, 1, 120), '—')

FROM pending_messages

WHERE ${WHERE}

ORDER BY id;

" | while IFS=$'\t' read -r id sess_id msg_type tool retries created failed preview; do

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/inspect-failed.sh` around lines 71 - 84, The SQL uses unixepoch on millisecond timestamps and references a non-existent column; update the datetime conversions to divide ms by 1000 (e.g. datetime(created_at_epoch/1000, 'unixepoch', 'localtime')) and replace the invalid failed_at_epoch reference with the actual schema column (or a safe COALESCE fallback) in the SELECT for the pending_messages query inside scripts/inspect-failed.sh; ensure both created_at_epoch and the failure-timestamp column you choose are divided by 1000 and wrapped with COALESCE(..., '—') so the while read loop (id, sess_id, msg_type, ...) can run without errors.

coderabbitai · 2026-04-10T17:37:15Z

+    // Guard: skip observations with no meaningful content (title, narrative, facts all empty).
+    // These are LLM-generated shells — e.g. <observation><type>bugfix</type></observation> —
+    // that provide no value and pollute the UI with blank "Untitled" cards.
+    // NOTE: "ALWAYS save observations" (10/24/2025) applies to observations with content;
+    // it was never intended to preserve completely empty shells.
+    const hasContent = !!(title || narrative || facts.length > 0);
+    if (!hasContent) {
+      logger.warn('PARSER', 'Skipping empty observation shell (no title, narrative, or facts)', { correlationId, type });
+      continue;
+    }


⚠️ Potential issue | 🟠 Major

Don't treat file-only or concept-only observations as empty.

This guard only counts title, narrative, and facts as meaningful content. A structured response like <observation><type>bugfix</type><files_read>...</files_read></observation> now gets dropped here, and ResponseProcessor will classify the whole response as malformed/empty instead of storing it or treating it as a valid observation. Include the other persisted fields in the emptiness check too.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/sdk/parser.ts` around lines 53 - 62, The current emptiness guard uses hasContent = !!(title || narrative || facts.length > 0) which wrongly treats observations that only have persisted fields like files_read or concept entries as empty; update the hasContent check in src/sdk/parser.ts to also include all other persisted observation fields (e.g., files_read/filesRead, concepts, references, attachments or any other stored arrays/objects) using proper truthiness checks (e.g., array.length > 0 or Object.keys(obj).length > 0) so those observations are not skipped, and adjust the logger message near the hasContent check (and any related logic in ResponseProcessor) to reflect which fields were considered when deciding emptiness.

coderabbitai · 2026-04-10T17:37:16Z

+    const failStmt = this.db.prepare(`
+      UPDATE pending_messages
+      SET status = 'failed', failed_at_epoch = ?
+      WHERE session_db_id = ? AND status IN ('pending', 'processing')
+        AND retry_count >= ?
+    `);
+    const failResult = failStmt.run(now, sessionDbId, this.maxRetries);


⚠️ Potential issue | 🔴 Critical

Use the schema’s terminal timestamp column here.

pending_messages does not have a failed_at_epoch column in the current schema; it only has completed_at_epoch. This update will fail as soon as a drained message has exhausted retries, and the new test/script changes are already assuming the same nonexistent column downstream.

🛠️ Suggested fix

const failStmt = this.db.prepare(` UPDATE pending_messages - SET status = 'failed', failed_at_epoch = ? + SET status = 'failed', completed_at_epoch = ? WHERE session_db_id = ? AND status IN ('pending', 'processing') AND retry_count >= ? `);

If you want a dedicated failure timestamp, add a migration first and update all readers/writers consistently.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const failStmt = this.db.prepare(`

UPDATE pending_messages

SET status = 'failed', failed_at_epoch = ?

WHERE session_db_id = ? AND status IN ('pending', 'processing')

AND retry_count >= ?

`);

const failResult = failStmt.run(now, sessionDbId, this.maxRetries);

const failStmt = this.db.prepare(`

UPDATE pending_messages

SET status = 'failed', completed_at_epoch = ?

WHERE session_db_id = ? AND status IN ('pending', 'processing')

AND retry_count >= ?

`);

const failResult = failStmt.run(now, sessionDbId, this.maxRetries);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/sqlite/PendingMessageStore.ts` around lines 297 - 303, The UPDATE uses a non-existent column failed_at_epoch on the pending_messages table; change the UPDATE in failStmt (and any related uses) to set the existing terminal timestamp column completed_at_epoch instead, i.e. use completed_at_epoch = ? with the same now parameter, or if you intend a separate failure timestamp add a DB migration to create failed_at_epoch and update all readers/writers consistently (refer to failStmt, pending_messages, failed_at_epoch, completed_at_epoch, and this.maxRetries to locate the code).

coderabbitai · 2026-04-10T17:37:16Z

+  // --- GUARD: Empty response with pending messages ---
+  // The observation prompt explicitly allows the LLM to return empty to skip trivial
+  // tool uses. Genuine API failures produce non-empty error text (rate-limit notices,
+  // auth errors, HTTP bodies), so an empty string here is an intentional skip, not a
+  // transient error. Confirm and delete the messages rather than retrying them.
+  if (!text.trim() && session.processingMessageIds.length > 0) {
+    const pendingStore = sessionManager.getPendingMessageStore();
+    for (const messageId of session.processingMessageIds) {
+      pendingStore.confirmProcessed(messageId);
+    }
+    logger.info('QUEUE', `SKIPPED | sessionDbId=${session.sessionDbId} | reason=intentional_skip | count=${session.processingMessageIds.length} | ids=[${session.processingMessageIds.join(',')}]`, {
+      sessionId: session.sessionDbId
+    });
+    session.processingMessageIds = [];
+    cleanupProcessedMessages(session, worker);
+    return { status: 'skipped', observationCount: 0, summaryStored: false };
  }


⚠️ Potential issue | 🟠 Major

Empty responses can still drop queued messages.

This branch confirms and clears processingMessageIds before the caller can distinguish an intentional skip from an upstream failure. OpenRouterAgent.queryOpenRouterMultiTurn() already normalizes one API failure mode to content: '', so this path can still delete real queued work on empty provider responses. If empty strings need to mean “skip”, that signal should be explicit rather than inferred from !text.trim().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/worker/agents/ResponseProcessor.ts` around lines 139 - 155, The code currently treats any empty text as an intentional skip and immediately confirms/clears session.processingMessageIds (using pendingStore.confirmProcessed and cleanupProcessedMessages), which can drop real queued work; change the guard in ResponseProcessor (the block that checks if (!text.trim() && session.processingMessageIds.length > 0)) to only treat an empty response as a skip when the observation includes an explicit skip marker (e.g., observation.intentionalSkip or a new boolean returned by OpenRouterAgent.queryOpenRouterMultiTurn), and otherwise do not call pendingStore.confirmProcessed or clear session.processingMessageIds—leave the messages for retry/inspection and return a non-skipped error status; update the agent call (OpenRouterAgent.queryOpenRouterMultiTurn) to propagate this explicit flag if necessary and adjust logging to reflect whether the empty response was an explicit skip or an upstream failure.

coderabbitai · 2026-04-10T17:37:16Z

+    if (isAuthErrorResponse(text)) {
+      logger.error('PARSER', `${agentName} returned auth error response; messages preserved for retry`, {
+        sessionId: session.sessionDbId,
+        preview
+      });
+      preserveMessages(session, sessionManager, worker, 'auth_error');
+      return { status: 'error', observationCount: 0, summaryStored: false };
+    }
+
+    const reason = hasXmlMarkers ? 'malformed_xml' : 'non_xml';
+    logger.warn('PARSER', `${agentName} returned ${reason === 'malformed_xml' ? 'malformed/truncated XML' : 'non-XML'} response; messages preserved for retry`, {
      sessionId: session.sessionDbId,
      preview
    });
+    preserveMessages(session, sessionManager, worker, reason);
+    return { status: 'error', observationCount: 0, summaryStored: false };
+  }


⚠️ Potential issue | 🟠 Major

Auth failures need a distinct result status.

isAuthErrorResponse() identifies a deterministic credential/configuration failure, but this still returns status: 'error'. The current callers only abort on rate_limited, so an expired token can burn one retry for every queued message in the same session before anything stops. Returning a dedicated auth_error/fatal status would let agents abort immediately while keeping transient parse errors retryable.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/worker/agents/ResponseProcessor.ts` around lines 189 - 205, The auth-path in ResponseProcessor currently treats isAuthErrorResponse(text) like a generic error; update it to return a distinct fatal status (e.g., return { status: 'auth_error', observationCount: 0, summaryStored: false }) instead of { status: 'error' ... } so callers can abort retries; keep the preserveMessages(session, sessionManager, worker, 'auth_error') call and the existing log, and ensure any callers that switch on returned status can handle the 'auth_error' value.

thedotmack · 2026-04-15T20:51:15Z

Closed during the April 2026 backlog cleanup. The underlying bug is now tracked in #1874, which is the single canonical issue being addressed by the maintainer. Thanks for taking the time to report — your symptoms and repro are captured in the consolidated ticket.

coderabbitai Bot requested changes Apr 7, 2026

View reviewed changes

Comment thread CHANGELOG.md Outdated

Comment thread CHANGELOG.md Outdated

Comment thread src/services/worker/agents/ResponseProcessor.ts

Comment thread src/services/worker/agents/ResponseProcessor.ts Outdated

gchaix force-pushed the fix/preserve-failed-observations branch from cdda789 to d8a24cb Compare April 8, 2026 16:55

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

coderabbitai Bot previously approved these changes Apr 8, 2026

View reviewed changes

gchaix dismissed coderabbitai[bot]’s stale review via d184b67 April 8, 2026 17:10

coderabbitai Bot previously approved these changes Apr 8, 2026

View reviewed changes

gchaix dismissed coderabbitai[bot]’s stale review via 85a537f April 8, 2026 20:00

coderabbitai Bot previously approved these changes Apr 8, 2026

View reviewed changes

gchaix added 6 commits April 10, 2026 10:22

gchaix dismissed coderabbitai[bot]’s stale review via 6697aac April 10, 2026 17:23

gchaix force-pushed the fix/preserve-failed-observations branch from 85a537f to 6697aac Compare April 10, 2026 17:23

coderabbitai Bot requested changes Apr 10, 2026

View reviewed changes

thedotmack mentioned this pull request Apr 15, 2026

[BUG] processAgentResponse silently deletes queued messages on non-XML LLM output #1874

Closed

thedotmack closed this Apr 15, 2026

Uh oh!

Conversation

gchaix commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Changed

Tests

Existing Safety Properties Preserved

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gchaix commented Apr 7, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gchaix commented Apr 10, 2026

Uh oh!

coderabbitai Bot commented Apr 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

thedotmack commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gchaix commented Apr 7, 2026 •

edited

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading