Skip to content

[FSTORE-2028] Add MongoDB as a Datasource#578

Open
jimdowling wants to merge 1 commit into
logicalclocks:mainfrom
jimdowling:fstore-2028-mongodb-s3-parquet
Open

[FSTORE-2028] Add MongoDB as a Datasource#578
jimdowling wants to merge 1 commit into
logicalclocks:mainfrom
jimdowling:fstore-2028-mongodb-s3-parquet

Conversation

@jimdowling
Copy link
Copy Markdown
Contributor

@jimdowling jimdowling commented May 18, 2026

Summary

Documentation for the two follow-on additions to the FSTORE-2028 datasource work:

  • MongoDB data source — new how-to under `data_source/creation/mongodb.md` (anchor `data-source-mongodb`), wired into the data-source index and the mkdocs nav. Covers connection-string + per-user auth split, Compass-style schema inference of schemaless collections, and the `_id` → `id` rename caveat.
  • S3 as a Parquet data source for external Feature Groups — the S3 how-to now explains the picker behaviour (single `*.parquet` file or a parquet directory), and the directory glob's schema-evolution (`union_by_name`) + Hive partition pushdown.

`create_external.md` gains single-file + directory S3 examples and a MongoDB example. The Feature Query Service coverage list is updated to reflect the current set of supported sources for `.read()` and `.show()` from the Python engine.

Test plan

  • mkdocs nav renders MongoDB under Data Source → Configuration and Creation
  • Cross-references resolve (`[data-source-mongodb]`, `[data-source-sap-hana]`, internal relative paths)
  • Code snippets in `create_external.md` exercised against a live cluster (single-file + directory S3 parquet, MongoDB `sample_mflix.comments`)

🤖 Generated with Claude Code

Documentation for two additions:

- MongoDB data source: new how-to under
  data_source/creation/mongodb.md, linked from the data source index
  and the mkdocs nav, with the same anchor (data-source-mongodb) the
  external-FG and usage guides reference. Covers the connection
  string + per-user auth split, Compass-style schema inference, and
  the schemaless / _id-rename caveats.

- S3 as a Parquet data source for external Feature Groups: the S3
  how-to now explains that the bucket is browsable, individual
  *.parquet files and parquet directories can each be registered as
  external Feature Groups, and directory mode handles schema
  evolution (union_by_name) plus Hive partition pushdown
  automatically.  create_external.md gains both a single-file and a
  directory example for S3, plus a MongoDB example, and the
  Feature Query Service coverage list is updated to reflect the
  current set of supported sources for .read() and .show().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant