Skip to content

feat: connect orchestrator and frontend#1564

Open
Scra3 wants to merge 199 commits into
mainfrom
feat/prd-214-server-step-mapper
Open

feat: connect orchestrator and frontend#1564
Scra3 wants to merge 199 commits into
mainfrom
feat/prd-214-server-step-mapper

Conversation

@Scra3
Copy link
Copy Markdown
Member

@Scra3 Scra3 commented Apr 20, 2026

MAIN BRANCH TO INTRODUCE WORKFLOW EXECUTOR.

image

Note

Connect workflow orchestrator to frontend by adding a workflow executor service and agent proxy routes

  • Introduces the @forestadmin/workflow-executor package: a standalone Koa HTTP service and CLI that polls the Forest server for pending workflow steps, executes them via dedicated step executors (condition, read-record, update-record, trigger-action, load-related-record, MCP, guidance), and reports results back.
  • Adds WorkflowExecutorProxyRoute to the agent, which forwards /_internal/workflow-executions/:runId GET and POST requests to a configured workflowExecutorUrl, preserving auth headers.
  • Extends @forestadmin/ai-proxy with AiClient, createBaseChatModel, getAiConfiguration, and validateAiConfigurations; propagates mcpServerId through RemoteTool subclasses and Forest integration tool factories.
  • Provides in-memory and Sequelize-backed RunStore implementations, a TTL SchemaCache, and a Runner with lifecycle management, deduplication via InFlightRunRegistry, and graceful shutdown.
  • Risk: adding workflowExecutorUrl to agent options and mounting proxy routes changes the agent's route surface; the new package exposes port 3400 by default and requires FOREST_ENV_SECRET, FOREST_AUTH_SECRET, and AGENT_URL environment variables to start.

Macroscope summarized 018ee05.

matthv and others added 30 commits March 17, 2026 15:00
…premature deps, add smoke test

- Rewrite CLAUDE.md with project overview and architecture principles, remove changelog
- Remove unused dependencies (ai-proxy, sequelize, zod) per YAGNI
- Add smoke test so CI passes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… document system architecture

- Lint now covers src and test directories
- Replace require() with import, use stronger assertion (toHaveLength)
- Add System Architecture section describing Front/Orchestrator/Executor/Agent
- Mark Architecture Principles as planned (not yet implemented)
- Remove redundant test/.gitkeep
- Make index.ts a valid module with export {}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…erver (#1504)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: alban bertolini <albanb@forestadmin.com>
…ain (#1512)

Co-authored-by: alban bertolini <albanb@forestadmin.com>
- Remove McpClient.tools property, loadTools() returns local array
- Rename closeConnections() → dispose()
- Rename testConnections() → checkConnection()
- Add McpServers type export
- Rename mcpServerConfigs → toolConfigs in create-ai-provider
- Update all tests accordingly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eability

Add sourceId to McpToolRef so that persisted execution data (pendingData,
executionParams) tracks which MCP server provided the tool. This fixes
tool lookup on re-entry (confirmation flow) when multiple servers expose
tools with the same name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… collection schema

- Replace non-null assertion with explicit McpToolNotFoundError when AI
  selects a tool name that doesn't match any available tool
- Resolve related collection name from parent schema before looking up
  the related schema in cache, fixing cases where relation name differs
  from target collection name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge main into feature branch, resolve conflicts by taking main's
versions of router.ts, create-ai-provider.ts and their tests.
Remove deleted mcp-config-checker.ts. Bump workflow-executor internal
deps to match main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread packages/workflow-executor/src/executors/base-step-executor.ts Outdated
Comment on lines +64 to +67
const status: GuidanceStepOutcome['status'] = ctx.status === 'error' ? 'error' : 'success';

return { type: 'guidance', ...baseFromCtx, status };
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low adapters/run-to-available-step-mapper.ts:64

The guidance branch converts 'awaiting-input' status to 'success' because it uses ctx.status === 'error' ? 'error' : 'success' instead of toRecordStatus. Since GuidanceStepOutcomeSchema explicitly allows 'awaiting-input' via RecordStepStatusSchema, this valid state is incorrectly overwritten. Consider using toRecordStatus(ctx.status) to preserve all valid status values.

Suggested change
const status: GuidanceStepOutcome['status'] = ctx.status === 'error' ? 'error' : 'success';
return { type: 'guidance', ...baseFromCtx, status };
}
if (outcomeType === 'guidance') {
const status: GuidanceStepOutcome['status'] = toRecordStatus(ctx.status);
return { type: 'guidance', ...baseFromCtx, status };
}
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/workflow-executor/src/adapters/run-to-available-step-mapper.ts around lines 64-67:

The `guidance` branch converts `'awaiting-input'` status to `'success'` because it uses `ctx.status === 'error' ? 'error' : 'success'` instead of `toRecordStatus`. Since `GuidanceStepOutcomeSchema` explicitly allows `'awaiting-input'` via `RecordStepStatusSchema`, this valid state is incorrectly overwritten. Consider using `toRecordStatus(ctx.status)` to preserve all valid status values.

Evidence trail:
packages/workflow-executor/src/adapters/run-to-available-step-mapper.ts lines 30-35 (toRecordStatus helper), line 64 (guidance branch using ternary instead of toRecordStatus); packages/workflow-executor/src/types/validated/step-outcome.ts lines 11 (RecordStepStatusSchema allows 'awaiting-input'), lines 57-63 (GuidanceStepOutcomeSchema uses RecordStepStatusSchema); packages/workflow-executor/src/executors/guidance-step-executor.ts line 14 (executor produces 'awaiting-input' for guidance steps)

Comment on lines +185 to +192
await this.context.runStore.saveStepExecution(this.context.runId, {
...existingExecution,
type: 'trigger-action',
stepIndex: this.context.stepIndex,
executionParams: { displayName, name },
executionResult: { success: true, actionResult },
selectedRecordRef,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium executors/trigger-record-action-step-executor.ts:185

saveFrontendResult spreads existingExecution without setting idempotencyPhase: 'done', so after Branch C completes the step data lacks that field. If the executor restarts, checkIdempotency() (line 53) fails to recognize completion because idempotencyPhase is undefined, causing the step to incorrectly re-enter the confirmation flow.

    await this.context.runStore.saveStepExecution(this.context.runId, {
      ...existingExecution,
      type: 'trigger-action',
      stepIndex: this.context.stepIndex,
      executionParams: { displayName, name },
      executionResult: { success: true, actionResult },
      selectedRecordRef,
+      idempotencyPhase: 'done',
    });
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/workflow-executor/src/executors/trigger-record-action-step-executor.ts around lines 185-192:

`saveFrontendResult` spreads `existingExecution` without setting `idempotencyPhase: 'done'`, so after Branch C completes the step data lacks that field. If the executor restarts, `checkIdempotency()` (line 53) fails to recognize completion because `idempotencyPhase` is undefined, causing the step to incorrectly re-enter the confirmation flow.

Evidence trail:
packages/workflow-executor/src/executors/trigger-record-action-step-executor.ts lines 48-62 (checkIdempotency), 134-142 (Branch C save without idempotencyPhase), 178-195 (saveFrontendResult missing idempotencyPhase:'done'), 146-174 (executeOnExecutor correctly sets idempotencyPhase). packages/workflow-executor/src/executors/update-record-step-executor.ts lines 205-239 (resolveAndUpdate sets idempotencyPhase:'done' even in confirmation flow). packages/workflow-executor/src/types/step-execution-data.ts lines 13-14 (MutatingStepExecutionData.idempotencyPhase definition), lines 76-83 (TriggerRecordActionStepExecutionData extends MutatingStepExecutionData). packages/workflow-executor/src/executors/base-step-executor.ts lines 194-234 (findPendingExecution, patchAndReloadPendingData), lines 238-262 (handleConfirmationFlow).


export default class ConsoleLogger implements Logger {
error(message: string, context: Record<string, unknown>): void {
console.error(JSON.stringify({ message, timestamp: new Date().toISOString(), ...context }));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low adapters/console-logger.ts:5

In error, warn, and info, the spread order { message, timestamp, ...context } allows context to overwrite message and timestamp when the caller passes those keys. This corrupts the log output by replacing the actual message or timestamp with values from context. Reorder the spread to { ...context, message, timestamp } to protect the primary log fields.

-      console.error(JSON.stringify({ message, timestamp: new Date().toISOString(), ...context }));
+      console.error(JSON.stringify({ ...context, message, timestamp: new Date().toISOString() }));
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/workflow-executor/src/adapters/console-logger.ts around line 5:

In `error`, `warn`, and `info`, the spread order `{ message, timestamp, ...context }` allows `context` to overwrite `message` and `timestamp` when the caller passes those keys. This corrupts the log output by replacing the actual message or timestamp with values from context. Reorder the spread to `{ ...context, message, timestamp }` to protect the primary log fields.

Evidence trail:
packages/workflow-executor/src/adapters/console-logger.ts lines 4-14 at REVIEWED_COMMIT — `...context` is spread last in the object literal on lines 5, 9, and 13, meaning any `message` or `timestamp` key in the `context: Record<string, unknown>` parameter overwrites the primary fields. JavaScript spec (ES2018+) defines that later spread properties overwrite earlier ones with the same key.

Comment on lines +14 to +18
function isRetryable(err: unknown): boolean {
const { status } = err as { status?: number };

return typeof status === 'number' && RETRYABLE_STATUS.has(status);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low adapters/with-retry.ts:14

When fn throws null or undefined, the destructuring const { status } = err as { status?: number } throws a TypeError before line 17 can check the status. This crashes the retry mechanism instead of treating the error as non-retryable. Consider adding a null guard before destructuring, or use optional chaining to safely access status.

 function isRetryable(err: unknown): boolean {
-  const { status } = err as { status?: number };
+  if (err == null) return false;
+  const status = (err as { status?: number }).status;
 
   return typeof status === 'number' && RETRYABLE_STATUS.has(status);
 }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/workflow-executor/src/adapters/with-retry.ts around lines 14-18:

When `fn` throws `null` or `undefined`, the destructuring `const { status } = err as { status?: number }` throws a `TypeError` before line 17 can check the status. This crashes the retry mechanism instead of treating the error as non-retryable. Consider adding a null guard before destructuring, or use optional chaining to safely access `status`.

Evidence trail:
packages/workflow-executor/src/adapters/with-retry.ts lines 14-18 at REVIEWED_COMMIT: `isRetryable` destructures `err` without null guard. JavaScript behavior: `const { x } = null` throws TypeError. Callers at packages/workflow-executor/src/adapters/forest-server-workflow-port.ts:231, packages/workflow-executor/src/adapters/forestadmin-client-activity-log-port.ts:26,58,80.

Comment thread packages/ai-proxy/src/ai-client.ts Outdated
Comment on lines +39 to +49
async loadRemoteTools(
mcpConfig: McpConfiguration,
): Promise<Awaited<ReturnType<McpClient['loadTools']>>> {
await this.disposeMcpClient('Error closing previous MCP connection');

const newClient = new McpClient(mcpConfig, this.logger);
const tools = await newClient.loadTools();
this.mcpClient = newClient;

return tools;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium src/ai-client.ts:39

If newClient.loadTools() throws, the McpClient instance created on line 44 is never disposed. Since this.mcpClient is only assigned after loadTools() succeeds, a failed call leaves newClient dangling with open connections, causing a resource leak. Consider wrapping loadTools() in a try/finally to ensure cleanup on failure.

  async loadRemoteTools(
    mcpConfig: McpConfiguration,
  ): Promise<Awaited<ReturnType<McpClient['loadTools']>>> {
    await this.disposeMcpClient('Error closing previous MCP connection');

    const newClient = new McpClient(mcpConfig, this.logger);
+    try {
      const tools = await newClient.loadTools();
      this.mcpClient = newClient;

      return tools;
+    } catch (error) {
+      await newClient.dispose();
+      throw error;
+    }
  }
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/ai-proxy/src/ai-client.ts around lines 39-49:

If `newClient.loadTools()` throws, the `McpClient` instance created on line 44 is never disposed. Since `this.mcpClient` is only assigned after `loadTools()` succeeds, a failed call leaves `newClient` dangling with open connections, causing a resource leak. Consider wrapping `loadTools()` in a try/finally to ensure cleanup on failure.

Evidence trail:
packages/ai-proxy/src/ai-client.ts lines 39-49 (REVIEWED_COMMIT): loadRemoteTools creates newClient on line 44, calls loadTools() on line 45, assigns to this.mcpClient on line 46. No try/finally wrapping.
packages/ai-proxy/src/mcp-client.ts lines 24-37 (REVIEWED_COMMIT): McpClient constructor creates MultiServerMCPClient instances stored in this.mcpClients.
packages/ai-proxy/src/mcp-client.ts lines 95-118 (REVIEWED_COMMIT): dispose() method calls client.close() on each MultiServerMCPClient - explicit cleanup needed.
packages/ai-proxy/src/ai-client.ts lines 55-66 (REVIEWED_COMMIT): disposeMcpClient only disposes this.mcpClient, not a local newClient variable.

});
}

async markFailed(handle: ActivityLogHandle, errorMessage: string): Promise<void> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low adapters/forestadmin-client-activity-log-port.ts:77

In markFailed, the errorMessage parameter is accepted but never passed to updateActivityLogStatus. The status is set to 'failed' without persisting the error context to the audit log, so the failure reason is lost. The message is only used for local logging when the update itself fails.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/workflow-executor/src/adapters/forestadmin-client-activity-log-port.ts around line 77:

In `markFailed`, the `errorMessage` parameter is accepted but never passed to `updateActivityLogStatus`. The status is set to `'failed'` without persisting the error context to the audit log, so the failure reason is lost. The message is only used for local logging when the update itself fails.

Evidence trail:
packages/workflow-executor/src/adapters/forestadmin-client-activity-log-port.ts lines 77-98 (markFailed method), packages/workflow-executor/src/ports/activity-log-port.ts line 20 (ActivityLogPort interface), packages/forestadmin-client/src/types.ts lines 254-258 (UpdateActivityLogStatusParams - no errorMessage field), packages/forestadmin-client/src/activity-logs/index.ts lines 97-107 (updateActivityLogStatus implementation), packages/forestadmin-client/CHANGELOG.md line 43 (intentional removal of errorMessage from updateActivityLogStatus)

Scra3 and others added 9 commits May 28, 2026 09:30
…mpat (#1604)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lient (#1599)

* fix(ai-proxy): route Forest connectors via tool-provider split in AiClient

AiClient.loadRemoteTools was instantiating McpClient directly, forwarding
every config (including ForestIntegrationConfig entries like Zendesk) to
MultiServerMCPClient. Forest connectors lack the stdio/HTTP transport
fields, so @langchain/mcp-adapters threw a Zod union error and any
workflow run touching a Forest-hosted connector crashed before the step
could execute. Delegate to createToolProviders so Forest connectors are
routed to ForestIntegrationClient — same path Router and ToolSourceChecker
already use. Fixes PRD-400.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ver (PRD-363) (#1607)

Before this change, every MCP step opened connections to all configured MCP servers
and listed their tools, then filtered in-memory by `step.mcpServerId`. With N
configured servers, an MCP step paid N× the connect/list-tools cost and hit every
upstream server unnecessarily.

- Extract a `RemoteToolFetcher` module that scopes configs by `cfg.id === mcpServerId`
  (the persisted DB id from PRD-360) before calling `loadRemoteTools`. Only the
  targeted server is contacted.
- Make `mcpServerId` required on `McpStepDefinition` (zod `z.string().min(1)`);
  remove the pre-PRD-360 legacy fallback. A missing id is now a malformed run, not
  a silent broad fetch.
- Unify the partial-failure check across MCP and Forest connectors via
  `tool.mcpServerId`; a Forest connector that fails to load is now flagged in
  `failedConfigNames` instead of surfacing as a generic `NoMcpToolsError`.
- Collapse `McpStepExecutor.getFilteredTools` into `requireTools` (pre-scoped list →
  assert non-empty); make `NoMcpToolsError`'s message actionable.
- Distinct ops diagnostics: "no configs at all" vs "no match" warns, and
  `failedConfigNames` error log on partial loads.

fixes PRD-363
---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tep logs (PRD-418) (#1616)

* feat(workflow-executor): include mcpServerId + mcpServerName in MCP step logs (PRD-418)

Adds a getExtraLogContext() hook on BaseStepExecutor (default {}) merged into
logCtx, overridden by McpStepExecutor to stamp every MCP step log line with
mcpServerId and mcpServerName. RemoteToolFetcher.fetch() now resolves the
serverName from the scoped config Record key and returns { tools, serverName },
forwarded to the executor via StepExecutorFactory. Diagnostic log lines in the
fetcher also carry mcpServerName.

fixes PRD-418

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +77 to +96
async getRecord({ collection, id, fields }: GetRecordQuery, user: StepUser): Promise<RecordData> {
return this.callAgent('getRecord', async () => {
const client = this.createClient(user);
const schema = this.resolveSchema(collection);
const records = await client.collection(collection).list<Record<string, unknown>>({
filters: buildPkFilter(schema.primaryKeyFields, id),
pagination: { size: 1, number: 1 },
...(fields?.length && { fields }),
});

if (records.length === 0) {
throw new RecordNotFoundError(collection, id.join('|'));
}

return {
collectionName: collection,
recordId: id,
values: restoreFieldNames(records[0], fields),
};
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium adapters/agent-client-agent-port.ts:77

When getRecord is called without fields, restoreFieldNames receives undefined and returns raw camelCase keys from the agent-client. Schema field names like card_number are returned as cardNumber, breaking lookups by schema field name. Consider passing all schema field names to restoreFieldNames when fields is undefined, similar to how getRelatedData handles this on lines 133-134.

-      return {
-        collectionName: collection,
-        recordId: id,
-        values: restoreFieldNames(records[0], fields),
-      };
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/workflow-executor/src/adapters/agent-client-agent-port.ts around lines 77-96:

When `getRecord` is called without `fields`, `restoreFieldNames` receives `undefined` and returns raw camelCase keys from the agent-client. Schema field names like `card_number` are returned as `cardNumber`, breaking lookups by schema field name. Consider passing all schema field names to `restoreFieldNames` when `fields` is undefined, similar to how `getRelatedData` handles this on lines 133-134.

Evidence trail:
packages/workflow-executor/src/adapters/agent-client-agent-port.ts lines 33-46 (restoreFieldNames function, returns values as-is when originalFieldNames is undefined at line 37), line 77-97 (getRecord passes `fields` directly which can be undefined at line 94), lines 120-143 (getRelatedData always passes schema field names at lines 132-134), lines 26-28 (toCamelCase conversion), lines 30-32 (comment explaining the camelCase issue)

nbouliol and others added 5 commits June 3, 2026 13:39
…ing provider errors [PRD-409] (#1609)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: alban bertolini <albanb@forestadmin.com>
…s (PRD-432) (#1620)

The executor mints the JWT it sends to the agent from StepUser (camelCase).
Ruby/Python agents splat JWT claims straight into Caller.new, which requires
snake_case kwargs (first_name, last_name, rendering_id, permission_level), so
the call failed with a 500 ArgumentError. The TS agent reads camelCase. There
is no single canonical casing, so emit both: the Ruby Caller absorbs the extra
camelCase keys via **_extra_args and the TS agent ignores the snake_case ones.

fixes PRD-432

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…e [PRD-430] (#1621)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants