Skip to content

Add data-availability triggering for TableTriggers#237

Draft
ryannedolan wants to merge 1 commit into
mainfrom
tabletrigger-input-watermarks
Draft

Add data-availability triggering for TableTriggers#237
ryannedolan wants to merge 1 commit into
mainfrom
tabletrigger-input-watermarks

Conversation

@ryannedolan

@ryannedolan ryannedolan commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • InputWatermarkProvider SPI
  • FIRE TRIGGER ... FROM ... TO
  • Backfill support
  • Demo Kafka event source

Details

Introduces a uniform way for TableTriggers to fire when input is complete through a data-time frontier. This is accomplished via a new SPI InputWatermarkProvider, which enables adapters to provide change events for specific storage systems. The Kafka adapter is extended as an example.

Also implements on-demand backfills and automatic repair of late/out-of-order data.

Both features make use of a new source-agnostic controller, which applies the SPI and generic backfill logic. No bespoke source-specific controller is required.

  • InputWatermarkProvider SPI (+ InputWatermarkService, TriggerInput, DataChange): a provider reports completeThrough(input) -> data-time completeness watermark, discovered via ServiceLoader. The generic TableTriggerReconciler advances the trigger's cursor to the frontier and launches the Job over the new window [watermark, timestamp]; sources without a provider keep firing on cron/manual.

  • FIRE TRIGGER ... FROM ... TO ...: one-off backfills over an explicit output window, run as a separate Job that never moves the incremental cursor. Bounds accept absolute instants/dates, " UNIT AGO", and NOW; the end is capped at the watermark.

  • Late-change repair: when a provider reports a change behind the watermark, the reconciler enqueues a bounded backfill for that window while keeping the user-facing watermark a monotone forward frontier (status.lateWatermark tracks arrival order internally).

  • Kafka event source: KafkaInputWatermark reports a topic's max record timestamp as its frontier (AdminClient.listOffsets/maxTimestamp), with a sample trigger.

Testing Done

None yet.

Introduce a uniform way for a TableTrigger to fire when its input is
*complete* through a data-time frontier, rather than on wall-clock cron ticks
alone, plus on-demand backfills and automatic repair of late/out-of-order data.
This is source-agnostic: each external system implements one SPI method instead
of a bespoke controller.

- InputWatermarkProvider SPI (+ InputWatermarkService, TriggerInput, DataChange):
  a provider reports completeThrough(input) -> data-time completeness watermark,
  discovered via ServiceLoader. The generic TableTriggerReconciler advances the
  trigger's cursor to the frontier and launches the Job over the new window
  [watermark, timestamp]; sources without a provider keep firing on cron/manual.

- FIRE TRIGGER ... FROM ... TO ...: one-off backfills over an explicit output
  window, run as a separate Job that never moves the incremental cursor. Bounds
  accept absolute instants/dates, "<n> UNIT AGO", and NOW; the end is capped at
  the watermark.

- Late-change repair: when a provider reports a change behind the watermark, the
  reconciler enqueues a bounded backfill for that window while keeping the
  user-facing watermark a monotone forward frontier (status.lateWatermark tracks
  arrival order internally).

- Kafka event source: KafkaInputWatermark reports a topic's max record timestamp
  as its frontier (AdminClient.listOffsets/maxTimestamp), with a sample trigger.

- No lookahead/lookback knobs: a completeness watermark makes "look ahead"
  unsound and late-repair subsumes "look back", so jobs own any wider trailing
  read in their own SQL and are handed {{watermark}}/{{timestamp}}.

The DDL parser is regenerated from the codegen sources (config.fmpp,
includes/parserImpls.ftl) via the Apache Calcite process in
hoptimator-jdbc/src/main/codegen/README.md; the FIRE grammar is added and no
LOOK BACK/AHEAD grammar is present.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Code Coverage

Overall Project 84.1% -0.8% 🟢
Files changed 74.29% 🟢

File Coverage
Trigger.java 100% 🟢
TableTriggerReconciler.java 92.51% -7.13% 🟢
K8sTriggerDeployer.java 92.42% -2.21% 🟢
HoptimatorDdlExecutor.java 84.44% -3.48%
DataChange.java 73.17% -26.83% 🟢
InputWatermarkService.java 69.7% -30.3% 🟢
KafkaInputWatermark.java 64.25% -35.75% 🟢
TriggerInput.java 20.22% -79.78%
InputWatermarkProvider.java 0%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant