Skip to content

Merge the HSM to CHASM migration feature branch#9865

Merged
bergundy merged 22 commits intomainfrom
nexus/hsm-to-chasm-migration
Apr 10, 2026
Merged

Merge the HSM to CHASM migration feature branch#9865
bergundy merged 22 commits intomainfrom
nexus/hsm-to-chasm-migration

Conversation

@bergundy
Copy link
Copy Markdown
Member

@bergundy bergundy commented Apr 8, 2026

What changed?

Merge the HSM to CHASM migration feature branch. The work is not yet complete but is safe to merge due to no public API changes required and the change being disabled by default.

Why?

Avoid the need to keep rebasing.

How did you test it?

Tested as part of ongoing feature work.

Potential risks

Not yet, the behavior is gated behind a dynamic config that is disabled by default.

@bergundy bergundy requested review from a team as code owners April 8, 2026 17:24
@stephanos stephanos force-pushed the nexus/hsm-to-chasm-migration branch from ecee5f2 to 634fa80 Compare April 8, 2026 19:53
@bergundy bergundy force-pushed the nexus/hsm-to-chasm-migration branch from 7fbdb9f to 5be312b Compare April 9, 2026 05:28
@stephanos stephanos force-pushed the nexus/hsm-to-chasm-migration branch 3 times, most recently from 978ec33 to 12007cf Compare April 9, 2026 15:05
Comment thread service/history/configs/config.go Outdated
HistoryCacheTTL: dynamicconfig.HistoryCacheTTL.Get(dc),
HistoryCacheNonUserContextLockTimeout: dynamicconfig.HistoryCacheNonUserContextLockTimeout.Get(dc),
HistoryCacheBackgroundEvict: dynamicconfig.HistoryCacheBackgroundEvict.Get(dc),
ChasmEnabled: dynamicconfig.EnableChasm.Get(dc),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already defined below L484 EnableChasm

gow and others added 17 commits April 9, 2026 12:09
- Added workflow command handler registry to CHASM's workflow library.
- Integrated CHASM's workflow library into workflow completion handler.

Migrating Nexus from HSM to CHASM.

Tests will be ported over once actual command handler implementations
are added.
 - Added `OperationState` proto fields
 - Migrated nexus operation state transitions.

Migrating nexus from HSM to CHASM

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

N/A. This code path is currently unreachable.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches persisted proto schemas and core state-transition logic for
retries/timeouts; while executors are still unimplemented, any
activation of this path could impact task scheduling and retry
semantics.
>
> **Overview**
> Migrates Nexus operation lifecycle handling to CHASM by implementing
the operation state machine transitions to emit invocation,
backoff-retry, and timeout tasks and to record attempt metadata (last
failure/completion time, next retry time, operation token).
>
> Expands the `OperationState` and task protos to persist
endpoint/operation identifiers, scheduling timestamps, retry/attempt
fields, and separate timeout task types (`schedule-to-start`,
`start-to-close`, `schedule-to-close`), and wires new timeout task
executors through Fx and the library task registry. Adds unit tests
covering the new transition behavior and task scheduling.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4ab5fd0. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Ported command handler for Nexus "schedule" command from HSM to CHASM.

CHASM migration.

- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Chetan Gowda <chetan.gowda@temporal.io>
Co-authored-by: Chetan Gowda <gow@users.noreply.github.com>
Co-authored-by: Shivam <57200924+Shivs11@users.noreply.github.com>
Ported command handler for Nexus "cancel" command from HSM to CHASM.

CHASM migration.

- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Roey Berman <roey.berman@gmail.com>
## What changed?

Replaces workflow-specific fields (ie `scheduled_event_token` and
`requested_event_id`) in the CHASM Nexus operation state proto with a
generic field.

## Why?

Ensure CHASM Nexus operation state has no workflow-specifics.

## How did you test it?
- [ ] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…flowregistry (#9474)

## What changed?
Just a package rename. No other code change.

## Why?
Migrating Nexus from HSM to CHASM

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Primarily a package/API rename and dependency wiring update; low
behavioral risk, but broad mechanical changes could cause compile-time
breakage if any call sites were missed.
> 
> **Overview**
> Renames the CHASM workflow command registry package from
`chasm/lib/workflow/command` to `chasm/lib/workflow/workflowregistry`
and updates its public API (`RegisterCommandHandler`, `CommandHandler`,
`CommandHandlerOptions`, `ErrCommandNotSupported`).
> 
> Propagates the rename through Nexus operation command handlers/tests,
History `RespondWorkflowTaskCompleted` CHASM fallback path, engine/fx
wiring, and related Nexus components so CHASM command handling continues
to resolve and invoke handlers via the new registry.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e4740af. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
Migrating Nexus history event Registry and Definition. I've also moved
all the event implementations as well with commented out bodies. I will
replace the implementations of `Apply()` and `CherryPick()` in follow
PRs.
Depends on #9474

## Why?
Migrating Nexus from HSM to CHASM.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces new history event registration and lookup paths for Nexus
operations; since the `Apply` implementations are currently stubs,
there’s some risk of silently skipping state transitions during
replication/reset until follow-up PRs complete the logic.
> 
> **Overview**
> Adds first-class **history event definitions** to
`workflowregistry.Registry` via a new `EventDefinition` interface (with
`Apply` and `CherryPick`) and
`RegisterEventDefinition`/`EventDefinition` APIs.
> 
> Wires Nexus operation event registration into the `fx` module and
introduces `events.go` with definitions for Nexus lifecycle events
(scheduled/cancel/start/complete/fail/cancel/timeout), including basic
workflow-task-trigger flags and `CherryPick` exclusion handling (notably
`RESET_REAPPLY_EXCLUDE_TYPE_NEXUS`), while leaving `Apply` bodies as
TODO stubs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e4bd20b. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
This PR adds methods in workflow component to handle nexus events.

## Why?
Migrating Nexus from HSM to CHASM

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Touches the workflow-task completion path and CHASM workflow wiring
(registry injection and history-event application), so misregistration
or missing context could cause runtime command failures despite largely
being additive/refactor changes.
> 
> **Overview**
> Adds CHASM `Workflow` helpers to *emit and apply* Nexus operation
lifecycle events (started/completed/failed/canceled/timed-out),
including consistent failure wrapping via
`NexusOperationExecutionFailure`.
> 
> Refactors CHASM workflow command/event registration by moving
`workflowregistry` into `chasm/lib/workflow` as `Registry`, injecting it
into CHASM context, and updating Nexus workflow command handlers to use
`AddAndApplyHistoryEvent` so command-emitted history events immediately
run their registered event definitions. Updates wiring/tests/callers
across services to construct `NewLibrary(NewRegistry())` and use the new
types/errors.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
70936c6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?
Migrating all the event definition's `Apply()` method from HSM to CHASM.
Also migrated unit tests.

## Why?
HSM to CHASM migration

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)


<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes Nexus operation lifecycle handling by moving
scheduling/cancellation and terminal event application into CHASM event
definitions and adjusting state transition semantics; regressions could
affect operation task emission, cancellation timing, and cleanup during
replay/reset.
> 
> **Overview**
> Implements CHASM-based Nexus operation event `Apply()` handlers:
scheduled/cancel-requested/started now create or update the in-memory
operation component (including spawning/scheduling a cancellation child
once an operation token exists), and terminal events
(completed/failed/canceled/timed-out) transition the operation then
remove it from the workflow.
> 
> Refactors workflow Nexus operation storage to be keyed by
`ScheduledEventId` (`int64`), simplifies command handlers to only emit
history events (letting event definitions create/update components), and
expands cancellation/operation state machines with concrete task
emission and retry/backoff metadata. Updates `chasm.Transition.Apply` to
run transition logic before mutating state (enabling source-state
inspection) and adds new CHASM-focused unit tests for the migrated event
definitions and updated transition behavior.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c0f4e55. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
This PR migrates the Nexus operation invocation task handler from HSM
version to Chasm.

Migrating from HSM to Chasm

- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [x] added new unit test(s)
- [ ] added new functional test(s)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new CHASM-based Nexus `StartOperation` execution path
with endpoint lookup, callback URL/token generation, and error
classification; mistakes could cause failed invocations, incorrect
retries/timeouts, or misrouted callbacks. Risk is mitigated somewhat by
strict task validation and added unit coverage, but the change touches
critical workflow/history integration and outbound request handling.
>
> **Overview**
> Migrates Nexus operation invocation execution to CHASM by implementing
`OperationInvocationTaskHandler.Validate/Execute` end-to-end, including
endpoint resolution (ID with name fallback), callback URL selection
(system vs templated), callback token generation, timeout budgeting,
outbound StartOperation calls (HTTP or internal history service),
metrics/logging, and classification of results into operation state
transitions.
>
> Adds supporting plumbing:
`OperationStore.NexusOperationInvocationData` and workflow
implementation that loads invocation input/headers from the scheduled
history event, plus a new
`MSPointer.LoadHistoryEvent`/`NodeBackend.LoadHistoryEvent` API.
Configuration is extended to parse `CallbackURLTemplate` into a
`*template.Template`, add `UseSystemCallbackURL`, and pass
`NumHistoryShards` for internal routing; new helper utilities centralize
callback building, error/failure conversion, and internal/HTTP start
logic.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4b13977. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
#9790)

## What changed?
Minor fixes.

## Why?
These were missing or misplaced previously.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)



<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes task routing and cancellation scheduling in the Nexus
operation state machine, which can affect execution flow and delivery to
outbound queues. While localized, incorrect destinations or transition
sequencing could cause stuck or misrouted operations.
> 
> **Overview**
> Ensures Nexus invocation tasks are routed to the correct outbound
queue by setting `TaskAttributes.Destination` to the operation endpoint
when scheduling and when rescheduling after backoff.
> 
> Moves the "cancellation already requested" handling from the workflow
`Started` event handler into `TransitionStarted`, so pending
cancellations are automatically scheduled as soon as an operation token
becomes available.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4eddec6. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## What changed?

- [[Nexus-Chasm] Improve nexus operation dynamic
config](59a7068)
- [[Nexus-Chasm] Improve operation state machine and task
handlers](c6f1fa8)
- [[Nexus-Chasm] Refactor workflow registry and nexus workflow
integration](45d0cfc)

## Why?

- Cleanup
- Improved test coverage
- Bug fixes
@bergundy bergundy force-pushed the nexus/hsm-to-chasm-migration branch from 12007cf to 6ae06cc Compare April 9, 2026 19:13
@bergundy bergundy force-pushed the nexus/hsm-to-chasm-migration branch from 2138a0e to 3206e99 Compare April 9, 2026 20:49
bergundy and others added 3 commits April 9, 2026 14:20
## What changed?

Port the invocation executor tests from
`components/nexusoperations/executor_tests.go` to CHASM.

## Why?

Test coverage was missing previously.
@bergundy bergundy merged commit 8ccf1af into main Apr 10, 2026
46 checks passed
@bergundy bergundy deleted the nexus/hsm-to-chasm-migration branch April 10, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants