Add data-availability triggering for TableTriggers#237
Draft
ryannedolan wants to merge 1 commit into
Draft
Conversation
Introduce a uniform way for a TableTrigger to fire when its input is
*complete* through a data-time frontier, rather than on wall-clock cron ticks
alone, plus on-demand backfills and automatic repair of late/out-of-order data.
This is source-agnostic: each external system implements one SPI method instead
of a bespoke controller.
- InputWatermarkProvider SPI (+ InputWatermarkService, TriggerInput, DataChange):
a provider reports completeThrough(input) -> data-time completeness watermark,
discovered via ServiceLoader. The generic TableTriggerReconciler advances the
trigger's cursor to the frontier and launches the Job over the new window
[watermark, timestamp]; sources without a provider keep firing on cron/manual.
- FIRE TRIGGER ... FROM ... TO ...: one-off backfills over an explicit output
window, run as a separate Job that never moves the incremental cursor. Bounds
accept absolute instants/dates, "<n> UNIT AGO", and NOW; the end is capped at
the watermark.
- Late-change repair: when a provider reports a change behind the watermark, the
reconciler enqueues a bounded backfill for that window while keeping the
user-facing watermark a monotone forward frontier (status.lateWatermark tracks
arrival order internally).
- Kafka event source: KafkaInputWatermark reports a topic's max record timestamp
as its frontier (AdminClient.listOffsets/maxTimestamp), with a sample trigger.
- No lookahead/lookback knobs: a completeness watermark makes "look ahead"
unsound and late-repair subsumes "look back", so jobs own any wider trailing
read in their own SQL and are handed {{watermark}}/{{timestamp}}.
The DDL parser is regenerated from the codegen sources (config.fmpp,
includes/parserImpls.ftl) via the Apache Calcite process in
hoptimator-jdbc/src/main/codegen/README.md; the FIRE grammar is added and no
LOOK BACK/AHEAD grammar is present.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Code Coverage
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FIRE TRIGGER ... FROM ... TODetails
Introduces a uniform way for
TableTriggersto fire when input is complete through a data-time frontier. This is accomplished via a new SPIInputWatermarkProvider, which enables adapters to provide change events for specific storage systems. The Kafka adapter is extended as an example.Also implements on-demand backfills and automatic repair of late/out-of-order data.
Both features make use of a new source-agnostic controller, which applies the SPI and generic backfill logic. No bespoke source-specific controller is required.
InputWatermarkProviderSPI (+ InputWatermarkService, TriggerInput, DataChange): a provider reports completeThrough(input) -> data-time completeness watermark, discovered via ServiceLoader. The generic TableTriggerReconciler advances the trigger's cursor to the frontier and launches the Job over the new window [watermark, timestamp]; sources without a provider keep firing on cron/manual.FIRE TRIGGER ... FROM ... TO ...: one-off backfills over an explicit output window, run as a separate Job that never moves the incremental cursor. Bounds accept absolute instants/dates, " UNIT AGO", and NOW; the end is capped at the watermark.Late-change repair: when a provider reports a change behind the watermark, the reconciler enqueues a bounded backfill for that window while keeping the user-facing watermark a monotone forward frontier (status.lateWatermark tracks arrival order internally).
Kafka event source: KafkaInputWatermark reports a topic's max record timestamp as its frontier (AdminClient.listOffsets/maxTimestamp), with a sample trigger.
Testing Done
None yet.