Skip to content

[HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation#585

Draft
jimdowling wants to merge 1 commit into
logicalclocks:mainfrom
jimdowling:HWORKS-2802
Draft

[HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation#585
jimdowling wants to merge 1 commit into
logicalclocks:mainfrom
jimdowling:HWORKS-2802

Conversation

@jimdowling
Copy link
Copy Markdown
Contributor

@jimdowling jimdowling commented May 21, 2026

Summary

User-guide section documenting the new partitioned_by parameter on feature group creation. Lives under the existing partitioning area in docs/user_guides/fs/feature_group/create.md.

Covers:

  • Usage example with create_feature_group / get_or_create_feature_group.
  • The storage-engine-derived contract: the user's dataframe never carries the grain columns; Delta GENERATED ALWAYS AS handles it server-side.
  • Validation rules (mutual exclusion with partition_key, requires event_time, enum membership).
  • Partition-pruning table — Delta auto-derives partition predicates from the GENERATED expressions for hierarchical specs. fg.read(start_time, end_time) and fg.filter(fg.event_time >= ...) prune at the partition level for hierarchical partitioned_by. Non-hierarchical specs (["month"], ["year","week"]) are valid but skip auto-derivation.
  • Online feature store behavior: derived columns live offline-only by default; online_partition_columns=true opts into online materialization.
  • Hudi: previously rejected pre-HWORKS-2807; post-HWORKS-2807 the same parameter works on Hudi via the server-side PartitionedByTransformer + CustomKeyGenerator.

Pairs with:

JIRA: HWORKS-2802. Engineering walkthrough: Confluence page.

Test plan

  • npx markdownlint-cli2 docs/user_guides/fs/feature_group/create.md clean.
  • uv run mkdocs build -s clean (run after the SDK PR lands, since the API reference plugin pulls from hopsworks-api main).
  • Visual check of the rendered section via mkdocs serve.

🤖 Generated with Claude Code

…tion

https://hopsworks.atlassian.net/browse/HWORKS-2802

Add a section to docs/user_guides/fs/feature_group/create.md
describing the storage-engine-native partitioned_by parameter for
Delta feature groups. Covers:

- Usage example with create_feature_group / get_or_create_feature_group.
- The CREATE TABLE … USING DELTA … GENERATED ALWAYS AS … contract:
  the storage layer derives the partition columns; the user's
  dataframe never carries them.
- Validation rules: mutual exclusion with partition_key, requires
  event_time.
- Partition pruning table — Delta auto-derives partition predicates
  from the GENERATED expressions for hierarchical specs (year /
  year+month / year+month+day / year+month+day+hour), so
  `fg.read(start_time=..., end_time=...)` and
  `fg.filter(fg.event_time >= ...)` prune at the partition level.
  Non-hierarchical specs (e.g. ["month"], ["year","week"]) are valid
  but skip the auto-derivation — only direct predicates on the
  grain columns prune. Recommend hierarchical specs.
- Online feature store behavior: derived columns live offline-only
  by default; online_partition_columns=true opts into online
  materialization. Until the onlinefs consumer filter ships, the
  backend rejects partitioned_by + online_enabled=true with the
  default online_partition_columns=false. Document both
  workarounds.
- Hudi: partitioned_by + HUDI is rejected at creation; Hudi support
  is tracked under a separate follow-up ticket.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jimdowling jimdowling changed the title [HWORKS-2802] Document partitioned_by parameter on feature group creation [HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant