feat: findByIds read primitive by techiejd · Pull Request #62 · techiejd/payloadcms-vectorize

techiejd · 2026-06-01T06:49:43Z

Adds findByIds — public method on VectorizedPayload and on the DbAdapter contract: fetch stored embedding records by primary key. Building block for "more like this."

By default returns each record's text + metadata; pass populateEmbedding: true to also get the raw embedding vector back (the normal search/query API never returns it). Defaults to false so callers don't pay for the heavy vector unless they ask — EmbeddingRecord.embedding is therefore optional.

The id of each record is whatever search() returns as result.id, so a search result round-trips directly.

What's here

EmbeddingRecord type (optional embedding?: number[]) + VectorizedPayload.findByIds({ knowledgePool, ids, populateEmbedding? }) + DbAdapter.findByIds(payload, poolName, ids, populateEmbedding?), re-exported for adapters.
Public wiring in createVectorizedPayloadObject (empty ids short-circuits to []), covered via the in-memory mock adapter.
PostgreSQL: Drizzle inArray direct read; embedding column selected only when populating; non-numeric ids dropped via /^\d+$/.
MongoDB: find({ _id: { $in } }) with { projection: { embedding: 0 } } when not populating; non-24-hex ids dropped via the existing HEX24 guard (never throws).
Cloudflare: official binding.getByIds(ids) (typed by refactor(cf): adopt @cloudflare/workers-types #61); getByIds always returns values, so the vector is stripped post-fetch when not populating.
Docs: adapters/README.md contract + root README.md Local API section.

populateEmbedding semantics

Default false → embedding omitted from each record. Where possible the read skips the vector at the source (pg: column not selected; mongo: { projection: { embedding: 0 } }); CF's getByIds always returns values, so it's stripped post-fetch.

Contract

Misses dropped (result length may be < ids.length); order not guaranteed; empty ids → [] without a backend call; unknown/malformed ids treated as misses, not errors.

Tests

All green: pg 66 · cf 95 · mongodb 111 · userland vectorizedPayload 21. Each adapter has a dedicated findByIds spec covering both populateEmbedding: true (full record incl. numeric embedding + reserved + extension fields) and the default omit-embedding case, plus drop-misses and empty-ids.

Stacking

#61 (cf-workers-types) is now merged into main, and this PR is based on main (latest merged in). The CF findByIds relies on #61's typed getByIds.

…atures

The CF adapter's '/// <reference types="@cloudflare/workers-types" />' pulls in Workers ambient globals that redefine Request/Response (where .json() returns unknown, not the DOM's any). The root tsconfig included ./adapters/*/src/**, so 'tsc --noEmit' (build:types:all) leaked those globals into core endpoint/admin code and failed typecheck. Adapters are already typechecked independently via their own tsconfig.build.json in the CI build job, so root coverage of adapter sources was redundant.

Depend on Pick<Vectorize, 'query' | 'upsert' | 'deleteByIds' | 'getByIds'> via a named VectorizeBinding type instead of the full 8-method Vectorize contract. env.VECTORIZE remains assignable; CloudflareVectorizeBinding is kept as a deprecated alias for back-compat.

# Conflicts: # adapters/mongodb/src/index.ts

Rename `payload.findEmbeddingsByIds` -> `payload.findByIds` and add an opt-in `populateEmbedding?: boolean` (default false). `EmbeddingRecord.embedding` is now optional and only returned when populateEmbedding is true. Each backend honors the flag at the source where possible: pg skips selecting the embedding column, mongodb uses { projection: { embedding: 0 } }, and CF strips values post-fetch (getByIds always returns them). DbAdapter.findByIds gains the populateEmbedding param; the shared mock and adapters README follow. Specs split into a populateEmbedding:true case (keeps the full-vector assertions) plus a default-omits-embedding case.

Root README advertised a `findEmbeddingsByIds` method that doesn't exist — the shipped public method is `findByIds`. Rename the method reference and example, document the `populateEmbedding?` param (default false), and fix the adapters/README EmbeddingRecord interface block to `embedding?: number[]` (optional, present only when populateEmbedding: true).

…by-ids

findByIds filtered ids through /^\d+$/ and mapped to Number, hardcoding an integer primary key. The embeddings collection defines no custom id, so under postgresAdapter({ idType: 'uuid' }) every embedding id is a uuid — the filter dropped all of them and findByIds returned [] for ids that exist, while search() round-tripped the same uuids fine. Pass ids straight to inArray; Postgres casts the text params to the column type, supporting both integer and uuid PKs. Well-formed but nonexistent ids are still absent from results; a malformed id now surfaces a backend error rather than being silently dropped (documented in adapters/README). Adds a uuid-idType regression spec.

chunkText and embeddingVersion are not required in the embeddings schema, so a null column was spread through raw as `null`, violating EmbeddingRecord / VectorSearchResult (`chunkText: string`). CF and MongoDB both coerce via String(x ?? '') → '', so identical data round-tripped as '' on those adapters but null on pg; a consumer doing record.chunkText.length crashed only on pg. Coerce sourceCollection/chunkText/embeddingVersion via String(x ?? '') in both mapRowsToRecords (findByIds) and mapRowsToResults (search) so pg matches the declared types and the other adapters. Adds a regression test.

techiejd · 2026-06-07T01:46:23Z

Related to #60

findByIds now returns Record<string, EmbeddingRecord | undefined> instead of Array<EmbeddingRecord>. Order isn't conserved by any backend and a lookup may miss, so an array forced callers to re-join by id and made misses a silent gap. Keying by the requested id makes the round-trip O(1) (records[searchHit.id]), order irrelevant, and a miss an explicit undefined. Every requested id is a key. Unify the malformed-id contract: unknown AND malformed ids map to undefined, never throw. pg now filters ids that don't match the PK column type (getSQLType: numeric for integer/serial, uuid-shaped for uuid) before the IN query, so a bad id is a miss instead of a cast error that poisoned the batch — matching mongo (non-24-hex drop) and cf (unknown ids absent from getByIds). Stop the mock adapter from swallowing real errors: only Payload NotFound is treated as a miss; everything else rethrows. Docs + specs updated across all adapters; note that key order is not input order (integer-like keys sort first) so callers must look up by id.

techiejd added 16 commits May 31, 2026 23:44

build(cf): add @cloudflare/workers-types devDependency

146549d

refactor(cf): adopt @cloudflare/workers-types Vectorize binding

0af470c

feat: add EmbeddingRecord type and findByIds/findEmbeddingsByIds sign…

3319f85

…atures

feat: wire findEmbeddingsByIds public method with mock-adapter coverage

4357d1d

feat(pg): implement findByIds read primitive

a5e95e5

feat(mongodb): implement findByIds read primitive

fd2d9d8

feat(cf): implement findByIds via Vectorize getByIds

1e53fee

docs: document findEmbeddingsByIds and findByIds contract

2fe55bc

docs: scope findByIds malformed-id behavior per adapter

ffb58d3

docs(cf): restore JSDoc on retained types after binding swap

fb98908

Merge remote-tracking branch 'origin/main' into feat/cf-workers-types

34c0d25

Merge branch 'feat/cf-workers-types' into feat/find-embeddings-by-ids

e82a623

# Conflicts: # adapters/mongodb/src/index.ts

techiejd changed the title ~~feat: findEmbeddingsByIds read primitive~~ feat: findByIds read primitive Jun 6, 2026

Merge remote-tracking branch 'origin/main' into feat/find-embeddings-…

674e7a7

…by-ids

techiejd changed the base branch from feat/cf-workers-types to main June 6, 2026 12:05

techiejd added 2 commits June 6, 2026 21:28

techiejd mentioned this pull request Jun 7, 2026

docs(cf): document pool read-isolation limitation + multi-pool guidance #67

Merged

techiejd force-pushed the feat/find-embeddings-by-ids branch from 4e4f1b4 to 1a9063d Compare June 7, 2026 01:44

techiejd mentioned this pull request Jun 7, 2026

Feature (change?) request: Allow searching with embedding #60

Open

techiejd merged commit ea6e169 into main Jun 11, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: findByIds read primitive#62

feat: findByIds read primitive#62
techiejd merged 20 commits into
mainfrom
feat/find-embeddings-by-ids

techiejd commented Jun 1, 2026 •

edited

Loading

Uh oh!

techiejd commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

techiejd commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's here

populateEmbedding semantics

Contract

Tests

Stacking

Uh oh!

techiejd commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

techiejd commented Jun 1, 2026 •

edited

Loading