backup: add Redis hash encoder by bootjp · Pull Request #725 · bootjp/elastickv

bootjp · 2026-05-01T16:24:47Z

Summary

First of the Phase 0a wide-column encoders. Translates !hs|meta| + !hs|fld| records on the live store into one hashes/<encoded>.json file per Redis hash, per the design at docs/design/2026_04_29_proposed_snapshot_logical_decoder.md:327.
Stacked on top of backup: Redis simple-type encoders (strings, HLL, TTL routing) #713 (the strings/HLL/TTL encoder); reuses RedisDB, the SHA-fallback KEYMAP path, and the dirsCreated cache.
TTL is folded inline into expire_at_ms (not a sidecar) so a restorer can replay each hash in one shot. HandleTTL adds a redisKindHash route.
Field values reuse the UTF-8-or-base64 envelope from PR backup: SQS encoder for queue meta and messages.jsonl (Phase 0a) #714's SQS body for binary safety. Field names percent-encode (EncodeSegment) when not valid UTF-8 to keep JSON object keys as strings.
Empty hashes still emit a file (HLEN==0 is observable). Length mismatches between meta.Len and observed !hs|fld| keys surface as redis_hash_length_mismatch warnings.
Output is sorted by user key + field name for deterministic dumps (matches the round-9 sort pattern from backup: DynamoDB encoder for tables and items (Phase 0a) #716).

Self-review (per CLAUDE.md "five lenses")

Data loss — empty hashes preserved; declared/observed length mismatch warned; field value bytes preserved verbatim via base64 envelope when not UTF-8; SHA-fallback paths feed KEYMAP.
Concurrency — RedisDB is single-writer per the existing contract; the new hashes map follows the same per-key state model used by kindByKey/inlineTTLEmitted.
Performance — fields buffered per key (Redis hashes are typically small); flushHashes runs once at Finalize. Sort is O(n log n) over user keys + per-hash field names — acceptable at dump cadence.
Data consistency — kindByKey registration on both meta and field arrival prevents TTL mis-routing across snapshot record-ordering scenarios.
Test coverage — 8 sub-tests: round-trip, empty hash, inline TTL routing, length mismatch warning, binary value envelope, malformed meta, truncated field key, SHA-fallback keymap.

Test plan

go test -race ./internal/backup/... — clean.
GOOS=windows GOARCH=amd64 go build ./internal/backup/... — clean.
GOOS=js GOARCH=wasm go build ./internal/backup/... — clean.
golangci-lint run — clean.

First of the Phase 0a wide-column encoders (after the redis_string.go strings/HLL/TTL work in PR #713). Hashes follow a meta+per-field key shape on the live store (`!hs|meta|<len><userKey>` → `[Len(8)]`, `!hs|fld|<len><userKey><field>` → field bytes; see store/hash_helpers.go), so the encoder buffers each user key's fields in memory and flushes one `hashes/<encoded>.json` file per hash at Finalize. Output schema matches the design (docs/design/2026_04_29_proposed_snapshot_logical_decoder.md:327): { "format_version": 1, "fields": {"name": "alice"}, "expire_at_ms": null } Cross-cutting decisions: - TTL is folded INLINE into expire_at_ms (not a separate sidecar) so a restorer can replay the hash in one shot. HandleTTL routes redisKindHash records into the hash state rather than the strings_ttl.jsonl path. - Field VALUES use the same UTF-8-or-base64 envelope introduced for the SQS message body in PR #714: valid UTF-8 → plain JSON string; non-UTF-8 → `{"base64":"..."}`. Field NAMES that are not valid UTF-8 are percent-encoded (EncodeSegment) for the JSON object key — JSON object keys must be strings. - Empty hashes (HLEN==0) still emit a file because their existence is observable to clients (HEXISTS, HLEN). Mismatched declared-vs-observed length surfaces as `redis_hash_length_mismatch` warning. - SHA-fallback user keys feed back through the existing recordIfFallback path so KEYMAP.jsonl still reverses non-reversible filenames. - Output is sorted by user key + field name for deterministic byte-identical dumps across runs (matches the round-9 sort pattern from #716 schemaToPublic). Tests cover: round-trip, empty hash, inline TTL routing, length mismatch warning, binary value envelope, malformed meta, truncated field key, SHA-fallback keymap.

coderabbitai · 2026-05-01T16:24:53Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0861e12-1a78-4734-8826-fa6a46e69490

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/backup-phase0a-redis-hash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements Redis hash backup support, including handlers for hash metadata and fields, and logic to flush accumulated state to JSON files. The implementation handles binary-safe field values and provides deterministic output. Review feedback identifies a potential panic on 32-bit architectures due to unsafe integer conversion and suggests refactoring the hashState helper to centralize key-kind tracking and reduce redundant string allocations.

gemini-code-assist · 2026-05-01T16:27:44Z

+	ukLen := int(binary.BigEndian.Uint32(b[:hashUserKeyLenSize])) //nolint:gosec // bounded by len(b)
+	if len(b) < hashUserKeyLenSize+ukLen {
+		return nil, false
+	}
+	return b[hashUserKeyLenSize : hashUserKeyLenSize+ukLen], true


On 32-bit architectures, converting a uint32 to int can result in a negative value if the high bit is set (e.g., for lengths > 2GB). This would bypass the subsequent length check and cause a panic when slicing b. Using uint64 for the comparison ensures safety across all architectures.

Suggested change

ukLen := int(binary.BigEndian.Uint32(b[:hashUserKeyLenSize])) //nolint:gosec // bounded by len(b)

if len(b) < hashUserKeyLenSize+ukLen {

return nil, false

}

return b[hashUserKeyLenSize : hashUserKeyLenSize+ukLen], true

ukLen := binary.BigEndian.Uint32(b[:hashUserKeyLenSize])

if uint64(len(b)) < uint64(hashUserKeyLenSize)+uint64(ukLen) {

return nil, false

}

return b[hashUserKeyLenSize : hashUserKeyLenSize+int(ukLen)], true

gemini-code-assist · 2026-05-01T16:27:44Z

+	st := r.hashState(userKey)
+	st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design
+	st.metaSeen = true
+	r.kindByKey[string(userKey)] = redisKindHash
+	return nil


The update to r.kindByKey can be centralized within the hashState helper to avoid redundancy and ensure consistency across all hash-related handlers.

Suggested change

st := r.hashState(userKey)

st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design

st.metaSeen = true

r.kindByKey[string(userKey)] = redisKindHash

return nil

st := r.hashState(userKey)

st.declaredLen = int64(binary.BigEndian.Uint64(value)) //nolint:gosec // signed int64 by design

st.metaSeen = true

return nil

References

Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

gemini-code-assist · 2026-05-01T16:27:45Z

+	st := r.hashState(userKey)
+	st.fields[string(fieldName)] = bytes.Clone(value)
+	r.kindByKey[string(userKey)] = redisKindHash
+	return nil


The update to r.kindByKey is redundant here if centralized in hashState.

Suggested change

st := r.hashState(userKey)

st.fields[string(fieldName)] = bytes.Clone(value)

r.kindByKey[string(userKey)] = redisKindHash

return nil

st := r.hashState(userKey)

st.fields[string(fieldName)] = bytes.Clone(value)

return nil

References

Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

gemini-code-assist · 2026-05-01T16:27:45Z

+func (r *RedisDB) hashState(userKey []byte) *redisHashState {
+	if st, ok := r.hashes[string(userKey)]; ok {
+		return st
+	}
+	st := &redisHashState{fields: make(map[string][]byte)}
+	r.hashes[string(userKey)] = st
+	return st
+}


Centralizing the kindByKey update here improves maintainability and avoids repeated map writes in the handlers. Additionally, caching the string conversion of userKey avoids redundant allocations when the state is not found.

Suggested change

func (r *RedisDB) hashState(userKey []byte) *redisHashState {

if st, ok := r.hashes[string(userKey)]; ok {

return st

}

st := &redisHashState{fields: make(map[string][]byte)}

r.hashes[string(userKey)] = st

return st

}

func (r *RedisDB) hashState(userKey []byte) *redisHashState {

ukStr := string(userKey)

if st, ok := r.hashes[ukStr]; ok {

return st

}

st := &redisHashState{fields: make(map[string][]byte)}

r.hashes[ukStr] = st

r.kindByKey[ukStr] = redisKindHash

return st

}

References

Centralize common complex logic into shared helper functions to avoid code duplication and improve maintainability.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 72e2b75a61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-01T16:29:12Z

+		jsonName := name
+		if !utf8.ValidString(name) {
+			jsonName = EncodeSegment([]byte(name))
+		}


Disambiguate encoded and raw hash field names

This conversion only percent-encodes non-UTF8 field names while leaving UTF-8 names unchanged, so distinct Redis fields can collapse to the same JSON key (for example, a literal field name %FF and a binary field name 0xFF). In that case fields[jsonName] overwrites one value, silently dropping data from hashes that contain both forms; Redis hash field names are binary-safe, so this is a valid input shape.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-01T16:29:13Z

+	for _, name := range names {
+		jsonName := name
+		if !utf8.ValidString(name) {
+			jsonName = EncodeSegment([]byte(name))


Avoid SHA-fallback when encoding hash field names

EncodeSegment is a filename encoder with a 240-byte limit and SHA fallback; applying it to non-UTF8 field names can emit an irreversible fallback token for long names. Because hash field names are stored only as JSON object keys and no KEYMAP is written for per-field names, restoring such hashes cannot reconstruct the original field-name bytes, causing silent key corruption for long binary field names.

Useful? React with 👍 / 👎.

…tion (PR #725, round 1) Codex round 12 raised two P1 findings on commit 72e2b75; Gemini raised one high and three mediums alongside. P1: Hash field name encoding via JSON object key was lossy. - Distinct fields could collapse to the same JSON key: a UTF-8 literal `%FF` and a single binary byte `0xFF` both ended up as `%FF` after the EncodeSegment-on-non-UTF-8 fallback, silently dropping data on hashes with both forms. - Long binary field names (>240 bytes) hit EncodeSegment's SHA fallback which is irreversible at this layer (no per-field KEYMAP), so restore could not reconstruct the original bytes. Fix: change `fields` to a JSON array of typed `{name, value}` records. Both name and value go through the same UTF-8-or-base64 envelope (marshalRedisBinaryValue, mirrors PR #714's SQS body), so arbitrary binary bytes round-trip without collisions and without SHA fallback. High (Gemini): parseUserKeyLenPrefix used `int(uint32)` which wraps to negative on 32-bit when the high bit is set, bypassing the bounds check and causing a slice panic. Compare in `uint64` space before converting. Medium (Gemini, ×3): centralise `kindByKey` registration in the hashState helper so meta/field/TTL paths agree on the kind in one place. Audited callers — HandleHashMeta, HandleHashField, and HandleTTL's `case redisKindHash:` arm — all reach hashState via the same userKey identity, and the TTL caller already knows the kind so the now-implicit kindByKey write is idempotent. No other callers per `grep -n "r\.hashState(" internal/backup/`. Tests: - TestRedisDB_HashRoundTripBasic updated to assert the array shape via new hashFieldArray / hashFieldByName helpers. - TestRedisDB_HashEmptyHashStillEmitsFile, _BinaryFieldValue updated likewise. - NEW TestRedisDB_HashBinaryFieldNameRoundTripsViaArray drives the exact Codex collision pair (UTF-8 `%FF` + binary 0xFF) and asserts both names survive distinctly with their expected types. Verified: linux race tests, GOOS=windows, GOOS=js wasm, lint — all clean.

bootjp · 2026-05-02T10:28:47Z

@claude review

Round 1 fixes for Codex P1 round 12 + Gemini high/medium on commit 72e2b75 (commit cacd4489):

P1 — hash field name encoding was lossy: distinct Redis fields could collapse to the same JSON object key (UTF-8 literal %FF and binary 0xFF both percent-encoded to %FF); >240-byte binary names hit EncodeSegment's SHA fallback which is irreversible at this layer. Fixed by changing fields to a JSON array of typed {name, value} records — both name and value use the UTF-8-or-base64 envelope from PR #714's SQS body, so arbitrary binary bytes round-trip without collision. Test: TestRedisDB_HashBinaryFieldNameRoundTripsViaArray drives the exact collision pair.

High (Gemini) — 32-bit overflow in length parser: parseUserKeyLenPrefix used int(uint32) which wraps to negative on 32-bit architectures when the high bit is set, bypassing the bounds check. Now compares in uint64 space before converting.

Medium (Gemini ×3) — kindByKey centralisation: r.kindByKey[uk] = redisKindHash was duplicated across HandleHashMeta and HandleHashField. Centralised in hashState helper. Per the loop's caller-audit rule: I grep'd r\.hashState( across internal/backup/ — three callers (HandleHashMeta, HandleHashField, HandleTTL's redisKindHash arm) all reach hashState via the same userKey identity, and the TTL caller already knows the kind so the now-implicit write is idempotent. Documented the audit rationale in the helper's comment.

go test -race, GOOS=windows GOARCH=amd64 go build, GOOS=js GOARCH=wasm go build, and golangci-lint all clean.

claude · 2026-05-02T10:28:59Z

Claude finished @bootjp's task in 2m 54s —— View job

Review — Round 1 fixes (commit `cacd4489`)

Read changed files
Review P1 fix: hash field name array encoding
Review High fix: 32-bit overflow in length parser
Review Medium fix: kindByKey centralisation
Check for new issues introduced
Post final review

All three fixes are correct and well-targeted. Notes below in order of importance.

P1 — Array-of-records output shape ✅

The switch from a JSON-object fields map to []hashFieldRecord correctly closes both reported failure modes:

The UTF-8 literal %FF and the binary byte 0xFF now occupy separate array entries; neither can shadow the other.
Long binary field names no longer go through EncodeSegment/SHA-fallback at all — the base64 envelope carries the raw bytes directly.

marshalRedisBinaryValue is clean. The internal redisHashState.fields map[string][]byte still uses raw byte-sequences as Go string keys, which is correct (Go string comparison is byte-order; string([]byte{0xff}) != string([]byte("%FF"))). The path from HandleHashField → fields map → marshalHashJSON → marshalRedisBinaryValue is lossless.

TestRedisDB_HashBinaryFieldNameRoundTripsViaArray drives the exact collision pair. base64URLEncode in the test matches base64.RawURLEncoding bit-for-bit (single-byte 0xff → "_w" confirmed).

Nit (no action required): sort.Strings(names) sorts binary field names by byte order. Deterministic and correct, though worth noting in the format doc that the sort key is the raw byte sequence, not any user-visible ordering.

High — 32-bit bounds check in `parseUserKeyLenPrefix` ✅

if uint64(len(b)) < uint64(hashUserKeyLenSize)+uint64(ukLen) {
    return nil, false
}
return b[hashUserKeyLenSize : hashUserKeyLenSize+int(ukLen)], true //nolint:gosec // bounded above

The uint64 comparison is correct. After the check passes, on a 32-bit platform int is 32-bit; but since uint64(len(b)) >= 4 + uint64(ukLen) and len(b) is a non-negative int, we have ukLen <= len(b) - 4 <= MaxInt32 - 4, so int(ukLen) is non-negative. The nolint comment explains the invariant. ✅

Medium — `kindByKey` centralised in `hashState` ✅

The audit comment documents the three callers and explains why the idempotent write from the HandleTTL path is safe. No new callers are possible outside the package (unexported method). ✅

Two observations worth noting for the follow-up

1. RedisHashMetaDeltaPrefix ordering risk (nit)

RedisHashMetaPrefix = "!hs|meta|" and RedisHashMetaDeltaPrefix = "!hs|meta|d|" — the delta prefix has the meta prefix as a strict prefix. A dispatcher that checks HasPrefix(key, RedisHashMetaPrefix) before HasPrefix(key, RedisHashMetaDeltaPrefix) would route delta-meta keys to HandleHashMeta, where parseHashMetaKey would strip !hs|meta| and try to parse d|<…> as a 4-byte length — either silently producing garbage or returning ErrRedisInvalidHashKey. The constant is unused today (no delta handler yet), but the dispatcher registration order will matter when it lands. Worth checking that the dispatcher evaluates RedisHashMetaDeltaPrefix before RedisHashMetaPrefix.

2. TTL-before-meta snapshot ordering (minor, pre-existing)

TestRedisDB_HashTTLInlinedFromScanIndex puts TTL after meta+field. If a snapshot ever places !redis|ttl|<K> before the corresponding !hs|meta|<K> (e.g. a scanner ordered by key prefix), kindByKey[K] is still redisKindUnknown at TTL time, and the expiry is counted as an orphan and silently dropped from the hash JSON. This is pre-existing behaviour and the snapshot engine probably guarantees consistent ordering, but a test for the reversed order would lock down the behaviour explicitly (or expose a need for a two-pass approach). Not a blocker here.

Verdict: All three reviewer-identified defects are correctly fixed. The test coverage for the P1 scenario is solid. Ready to merge once the pre-existing ordering observation above is acknowledged.

…' into feat/backup-phase0a-redis-hash

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4646b9f7f6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T10:34:50Z

+	if len(fieldName) == 0 {
+		return cockroachdberr.Wrapf(ErrRedisInvalidHashKey,
+			"empty field name in key %q", key)


Permit empty hash field names

HandleHashField currently treats a zero-length field name as malformed, but Redis hash field names are binary-safe and may legitimately be empty. In this case a valid key like !hs|fld|<len><userKey> (with no trailing field bytes) will cause backup decoding to error out instead of emitting the hash, so backups can fail on real data created via HSET with an empty field name.

Useful? React with 👍 / 👎.

…' into feat/backup-phase0a-redis-hash

Codex P1 round 13 (commit 4646b9f): HandleHashField was rejecting zero-length field names as ErrRedisInvalidHashKey, but Redis hash field names are binary-safe and `HSET k "" v` is a valid command. The live store emits a key shaped exactly `!hs|fld|<len><userKey>` with no trailing field bytes for that case, so backup decoding would error out on real data created via HSET with an empty field name. Remove the explicit zero-length check; the array-of-records JSON shape introduced in round 1 already handles empty names without collision (an entry like {"name":"", "value":"..."} sorts and round-trips distinctly). Caller audit per loop's audit-semantics rule (fail-closed → fail-open behavior change for empty fieldName): - HandleHashField has only test-code callers; no production pipeline consumes it yet (per `grep -n "HandleHashField" internal/backup/`). - The truncated-key test (TestRedisDB_HashRejectsTruncatedFieldKey) drives a malformed binary key whose parseHashFieldKey returns ok=false; that branch still surfaces ErrRedisInvalidHashKey, independent of the empty-name check we removed. Test: TestRedisDB_HashAcceptsEmptyFieldName drives one empty- name field and one named field, asserts both survive distinct records in the output JSON.

bootjp · 2026-05-02T10:43:56Z

@claude review

Round 2 fix for Codex P1 round 13 on commit 4646b9f (commit f44a4e7b):

HandleHashField was rejecting zero-length field names as ErrRedisInvalidHashKey, but Redis hash field names are binary-safe and HSET k "" v is a valid command. The live store emits exactly !hs|fld|<len><userKey> with no trailing field bytes for that case, so backup decoding would error out on real data.

Removed the explicit zero-length check; the array-of-records JSON shape introduced in round 1 already handles empty names without collision (an entry like {"name":"", "value":"..."} round-trips distinctly).

Caller audit per loop's audit-semantics rule (fail-closed → fail-open for empty fieldName):

HandleHashField has only test-code callers (no production pipeline consumes it yet).
The truncated-key test TestRedisDB_HashRejectsTruncatedFieldKey drives a malformed binary key where parseHashFieldKey returns ok=false; that branch still surfaces ErrRedisInvalidHashKey, independent of the empty-name check.

Test: TestRedisDB_HashAcceptsEmptyFieldName drives one empty-name field and one named field, asserts both survive distinctly in the output JSON. go test -race and golangci-lint clean.

claude · 2026-05-02T10:44:12Z

Claude finished @bootjp's task in 1m 29s —— View job

Review — Round 2 fix (commit `f44a4e7b`)

Read changed files
Review Round 2 fix: empty hash field name acceptance
Verify test coverage for empty field name
Check for new issues introduced
Post final review

The fix is correct and minimal. Analysis below.

The change

HandleHashField previously held:

if len(fieldName) == 0 {
    return cockroachdberr.Wrapf(ErrRedisInvalidHashKey,
        "empty field name in key %q", key)
}

This is removed. The trigger is legitimate: Redis hash field names are binary-safe and HSET k "" v is a valid command that the live store encodes as !hs|fld|<4-byte ukLen><userKey> with no trailing bytes. The zero-length rejection would therefore cause real backups to fail on valid data.

Correctness walk-through

parseHashFieldKey with no trailing bytes:

fieldName := rest[hashUserKeyLenSize+len(userKey):]
return userKey, fieldName, true

When the key is exactly !hs|fld|<4><userKey> with nothing after the user key bytes, rest[4+len(userKey):] is an empty slice. parseUserKeyLenPrefix already verified that len(rest) >= 4 + ukLen, so no out-of-bounds — returns (userKey, []byte{}, true). ✅

Map key collision with empty field name:

st.fields[string(fieldName)] — Go's empty string "" is a valid and distinct map key. No path in the encoder writes "" as a field name except a genuine empty-field-name arrival, so there is no collision risk. ✅

Duplicate empty field names (two HSET k "" … operations):

st.fields[""] = bytes.Clone(value) overwrites on a second arrival — consistent with Redis semantics (last write wins for the same field). ✅

declaredLen / length-mismatch warning:

The test registers declaredLen=2 and inserts exactly 2 fields (empty-name + "named"). The mismatch warning is NOT triggered. Correct. ✅

Sort order:

sort.Strings sorts the raw byte sequences of field names. Empty string "" sorts before any non-empty string, so fields[0]["name"] being "" in the test is correct. ✅

Test coverage

TestRedisDB_HashAcceptsEmptyFieldName covers:

Meta + 2 fields (one empty-name, one named) → Finalize() succeeds
Array length is 2
fields[0] has name="", value="empty-name-value"
fields[1] has name="named"

TestRedisDB_HashRejectsTruncatedFieldKey remains intact: it drives a key where parseHashFieldKey returns ok=false (malformed ukLen that exceeds the available bytes), and that path still surfaces ErrRedisInvalidHashKey. The removal of the zero-length check does not affect this test. ✅

One minor nit (no action required)

internal/backup/redis_hash.go:72 — the new comment includes "Codex P1 round 13 (PR #725)" which is a task reference. CLAUDE.md says these belong in PR descriptions rather than code comments since they rot as the codebase evolves. The rest of the comment — explaining that empty field names are binary-safe in Redis and that the live store emits exactly the no-trailing-bytes key shape — is the non-obvious why that justifies keeping a comment at all. A future reader would benefit from it even without the PR reference. Worth noting if the project ever does a comment cleanup pass.

Verdict: The fix is minimal, correct, and the regression test is solid. Pre-existing open observations from Round 1 (delta-prefix dispatch ordering, TTL-before-meta snapshot ordering) remain unchanged and were acknowledged in the last round. Ready to merge.

…' into feat/backup-phase0a-redis-hash

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6abf62f547

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-02T11:45:58Z

+// parseHashMetaKey strips !hs|meta| and the 4-byte BE userKeyLen prefix.
+// Returns (userKey, true) on success.
+func parseHashMetaKey(key []byte) ([]byte, bool) {
+	rest := bytes.TrimPrefix(key, []byte(RedisHashMetaPrefix))


Exclude hash meta-delta keys from meta parsing

parseHashMetaKey trims RedisHashMetaPrefix, which also matches !hs|meta|d|... delta keys, so those records are treated as base-meta candidates and then fail parsing as malformed keys. Delta keys are a normal part of live Redis hash state (the codebase even documents that delta prefixes share the meta prefix in adapter/redis_retry.go), so when snapshot dispatch sends hash-prefixed records to this encoder, valid snapshots containing unresolved deltas can abort with ErrRedisInvalidHashKey instead of being skipped or handled.

Useful? React with 👍 / 👎.

Codex P1 round 14 (commit 6abf62f): The delta prefix `!hs|meta|d|` shares the base meta prefix `!hs|meta|` (the live store/hash_helpers.go documents the same nesting). A snapshot dispatcher that routes records by "starts with RedisHashMetaPrefix" lands delta records in HandleHashMeta, where parseHashMetaKey trimmed the meta prefix and tried to parse `d|<len>...` as `<len><userKey>` — failing with ErrRedisInvalidHashKey on real snapshots that contain unresolved deltas. Phase 0a's hash output emits an array of OBSERVED fields (`!hs|fld|...` records), so the delta arithmetic the live system uses to recompute Len is not needed at backup time. Skip delta keys silently in HandleHashMeta. Defense-in-depth: parseHashMetaKey also returns `!ok` for delta-prefixed keys so a future refactor that bypasses the HandleHashMeta short-circuit still surfaces a parse failure rather than silent state corruption. Caller audit (per loop's audit-semantics rule for fail-closed → fail-open behavior change): - HandleHashMeta has only test-code callers; no production pipeline consumes it yet. - parseHashMetaKey is called only from HandleHashMeta. The !ok return is mapped to ErrRedisInvalidHashKey in HandleHashMeta's existing path; the delta short-circuit runs first now so the parser path never sees a delta key in normal flow. Test: TestRedisDB_HashHandleHashMetaSkipsDeltaKey synthesises a realistic delta key and asserts both that HandleHashMeta returns nil and that parseHashMetaKey returns !ok directly.

bootjp · 2026-05-02T11:51:01Z

@claude review

Round 3 fix for Codex P1 round 14 on commit 6abf62f (commit b757dadf):

The delta prefix !hs|meta|d| shares the base meta prefix !hs|meta| (the live store/hash_helpers.go documents the same nesting). A snapshot dispatcher that routes records by "starts with RedisHashMetaPrefix" landed delta records in HandleHashMeta, where parseHashMetaKey trimmed the meta prefix and tried to parse d|<len>... as <len><userKey> — failing with ErrRedisInvalidHashKey on real snapshots containing unresolved deltas.

Phase 0a's hash output emits an array of OBSERVED fields (!hs|fld| records), so the delta arithmetic the live system uses to recompute Len is not needed at backup time. HandleHashMeta now skips delta keys silently. Defense-in-depth: parseHashMetaKey also returns !ok for delta-prefixed keys so a future refactor that bypasses the HandleHashMeta short-circuit surfaces a parse failure rather than silent state corruption.

Caller audit per loop's rule (fail-closed → fail-open behavior):

HandleHashMeta has only test-code callers; no production pipeline yet.
parseHashMetaKey is called only from HandleHashMeta; the !ok return path still maps to ErrRedisInvalidHashKey for non-delta malformed keys.

Test: TestRedisDB_HashHandleHashMetaSkipsDeltaKey synthesises a realistic delta key and asserts both that HandleHashMeta returns nil and that parseHashMetaKey returns !ok directly.

go test -race and golangci-lint clean.

claude · 2026-05-02T11:51:16Z

Claude finished @bootjp's task in 1m 40s —— View job

Review — Round 3 fix (commit `b757dadf`)

Read changed files
Review delta-key skip logic in HandleHashMeta
Review defense-in-depth parseHashMetaKey change
Check test coverage for TestRedisDB_HashHandleHashMetaSkipsDeltaKey
Check for new issues introduced
Post final review

The fix is correct and the defense-in-depth approach is better than what I flagged in Round 1. Analysis below.

The problem recapped

RedisHashMetaDeltaPrefix = "!hs|meta|d|" is a strict extension of RedisHashMetaPrefix = "!hs|meta|". A dispatcher routing by HasPrefix(key, RedisHashMetaPrefix) routes delta keys into HandleHashMeta. The old parseHashMetaKey stripped !hs|meta| and tried to parse d|<4-byte len>... as <4-byte len><userKey> — producing ErrRedisInvalidHashKey on valid snapshots containing unresolved deltas. Confirmed against store/hash_helpers.go: HashMetaDeltaPrefix = "!hs|meta|d|" (line 19) — the prefix strings in the backup package match the live store exactly.

`HandleHashMeta` skip (lines 63-65) ✅

if bytes.HasPrefix(key, []byte(RedisHashMetaDeltaPrefix)) {
    return nil
}

The skip happens before r.hashState(userKey), so no phantom hash state is created for a delta-only key. Phase 0a emits observed fields (!hs|fld| records), so the delta arithmetic that the live system uses to recompute Len is genuinely unneeded here — the design rationale is correct.

This resolves the "dispatcher evaluation order" concern I raised in Round 1 with a better approach: rather than requiring the dispatcher to evaluate the longer prefix first, the handler itself is now safe regardless of how the dispatcher is ordered. ✅

`parseHashMetaKey` defense-in-depth (lines 133-135) ✅

if bytes.HasPrefix(key, []byte(RedisHashMetaDeltaPrefix)) {
    return nil, false
}

The parser-level guard is correct defense-in-depth. A future caller that bypasses the HandleHashMeta short-circuit would get a clean parse failure (!ok → ErrRedisInvalidHashKey) rather than silent state corruption from interpreting d| as a 4-byte length prefix. The comment accurately describes this invariant: "Callers that want delta-tolerant behavior (HandleHashMeta) should detect the delta prefix BEFORE calling this function." ✅

The ordering is safe: the delta check runs before bytes.TrimPrefix(key, []byte(RedisHashMetaPrefix)), so the "not a meta key at all" check (len(rest) == len(key)) is never evaluated for a delta key — which is fine since returning nil, false early is already the correct result.

Test: `TestRedisDB_HashHandleHashMetaSkipsDeltaKey` ✅

The test synthesises a structurally realistic delta key:

!hs|meta|d| + [4-byte BE len=1] + "k" + [8-byte commitTS=1] + [4-byte seqInTxn=0]

This matches store.HashMetaDeltaKey("k", 1, 0) byte-for-byte (same prefix, same encoding shape). Both assertions are present:

HandleHashMeta returns nil → silent skip is exercised ✅
parseHashMetaKey returns !ok → parser-level guard is exercised ✅

One minor gap (no action required): the test doesn't call Finalize() to confirm the delta key doesn't accidentally create an entry in r.hashes. It doesn't need to — HandleHashMeta returns before r.hashState() is called, so no state is created. But if future reviewers see only this test in isolation, the coverage relationship to flushHashes is implicit. The existing round-trip tests cover Finalize; not a gap.

Outstanding observations (from prior rounds)

TTL-before-meta snapshot ordering (Round 1, pre-existing): No change here; still a pre-existing behaviour worth a follow-up test but not a blocker.

Task-reference comments (Round 2 nit, pre-existing): The HandleHashMeta and parseHashMetaKey comments both include "Codex P1 round 14 (PR #725 #13)" inline. Per CLAUDE.md these belong in commit/PR descriptions, not code. Not a blocker; worth a future cleanup pass.

Verdict: All three rounds of reviewer-identified defects are correctly fixed. The Round 3 approach (make the handler robust regardless of dispatcher order + defense-in-depth at the parser boundary) is cleaner than requiring dispatcher ordering and directly addresses what I flagged in Round 1. Ready to merge.

…lob layout Codex flagged that the wide-column zset encoder skips the legacy consolidated single-key blob layout the live store still writes. A zset stored only as `!redis|zset|<userKey>` (with the magic- prefixed pb.RedisZSetValue body) is silently dropped from backup output and any inline TTL becomes an orphan — user-visible sorted-set data loss. Live-side references (adapter, not changed by this commit): - adapter/redis_compat_types.go:82 — redisZSetPrefix - adapter/redis_compat_commands.go:3495-3508 — writes the blob for non-empty persisted zset updates - adapter/redis_compat_helpers.go:610-631 — reads it as the fallback when no wide-column members exist Fix: new public RedisDB.HandleZSetLegacyBlob method that decodes the magic-prefixed pb.RedisZSetValue and registers the same per- member state HandleZSetMember would. The wide-column merge case (mid-migration snapshot containing BOTH layouts for the same user key) works because `!redis|zset|` sorts BEFORE `!zs|...` so the blob arrives first and wide-column rows then update / add members via the latest-wins map. Inline TTL: `!redis|zset|<k>` sorts BEFORE `!redis|ttl|<k>`, so HandleTTL after this handler sees redisKindZSet already and folds via the case-redisKindZSet branch. No pendingTTL detour needed for this ordering. Fail-closed contract (matches existing wide-column path): - Missing magic prefix → ErrRedisInvalidZSetLegacyBlob - Unmarshal error → ErrRedisInvalidZSetLegacyBlob - NaN score → ErrRedisInvalidZSetLegacyBlob (Redis ZADD rejects NaN at wire level) Caller audit (per /loop standing instruction): new public method HandleZSetLegacyBlob has no external callers. Verified via 'grep -rn HandleZSetLegacyBlob --include=*.go' — all matches inside the test file in this PR. The cmd/elastickv-snapshot-decode dispatcher (Phase 0a follow-up, not yet built) will route the `!redis|zset|` prefix to this handler. Parallel bug class: the SAME issue exists for `!redis|hash|`, `!redis|set|`, and `!redis|stream|` legacy blob prefixes. Those encoders shipped in earlier PRs (#725, #758, #791). Each needs its own legacy-blob handler in a follow-up PR; this commit fixes only the zset case codex flagged on PR #790. New tests: - TestRedisDB_ZSetLegacyBlobRoundTrip — basic round-trip - TestRedisDB_ZSetLegacyBlobThenWideColumnMerges — mid-migration - TestRedisDB_ZSetLegacyBlobWithInlineTTL — TTL ordering - TestRedisDB_ZSetLegacyBlobRejectsMissingMagic — fail-closed - TestRedisDB_ZSetLegacyBlobRejectsNaNScore — fail-closed - TestRedisDB_ZSetLegacyBlobRejectsMalformedKey — fail-closed Self-review: 1. Data loss — exact opposite: this commit RECOVERS zsets that were silently dropped. New fail-closed guards prevent silently importing a corrupt blob. 2. Concurrency — no new shared state; per-DB sequential as before. 3. Performance — one protobuf Unmarshal per legacy zset key (same as the live read path). Member map shares the same latest-wins behavior as wide-column intake. 4. Consistency — merge order (blob first, wide-column second) is determined by snapshot lex order; tested explicitly. 5. Coverage — 6 new tests. All 84 redis tests pass.

## Summary Adds the Redis sorted-set (zset) encoder for the Phase 0 logical snapshot decoder (`docs/design/2026_04_29_proposed_snapshot_logical_decoder.md`, lines 334-335). Mirrors the hash/list/set encoders shipped in #725/#755/#758. After this lands, Phase 0a's remaining Redis work is the stream encoder + HLL TTL sidecar; the per-adapter encoders themselves will be complete. Output shape per the design: ```json {"format_version": 1, "members": [{"member": "alice", "score": 100}], "expire_at_ms": null} ``` Members sorted by raw byte order (not by score) so `diff -r` between two dumps with the same logical contents stays line-stable across score-only mutations. Scores emit as JSON numbers for finite values, ZADD-conventional `"+inf"` / `"-inf"` strings for ±Inf (`json.Marshal` rejects non-finite floats). NaN scores fail closed at intake — Redis's ZADD rejects NaN at the wire level, so a NaN at backup time indicates corruption. Wire layout (mirrors `store/zset_helpers.go`): - `!zs|meta|<userKeyLen(4)><userKey>` → 8-byte BE Len - `!zs|mem|<userKeyLen(4)><userKey><member>` → 8-byte IEEE 754 score - `!zs|scr|<userKeyLen(4)><userKey><sortableScore(8)><member>` → silently discarded (secondary index; `!zs|mem|` is the source of truth) - `!zs|meta|d|<userKeyLen(4)><userKey><commitTS(8)><seqInTxn(4)>` → silently skipped (delta family; same policy as hash/set encoders) TTL routing: `!redis|ttl|<userKey>` for a registered zset folds into the JSON `expire_at_ms` field, matching the set/list/hash inlining, so a restorer replays ZADD + EXPIRE in one shot. ## Self-review (5 lenses) 1. **Data loss** — `!zs|mem|` is the source of truth; `!zs|scr|` and delta records are intentional silent skips with audit notes. Empty zsets (Len=0 but meta seen) still emit a file because `TYPE k → zset` is observable to clients. 2. **Concurrency / distributed** — `RedisDB` is sequential per scope (matches the existing per-DB encoder contract); no shared state. 3. **Performance** — per-zset state in a map, flushed once at Finalize. Bounded by `maxWideColumnItems` on the live side; sort is O(n log n) on member-name bytes. Identical cost shape to hash/set encoders. 4. **Data consistency** — JSON field order pinned via struct tags (not map). Inf score uses `json.RawMessage` so the same `score` key emits as either a number or a string. NaN fails closed. 5. **Test coverage** — 17 table-driven tests under `internal/backup/redis_zset_test.go`: - round-trip basic / empty / TTL inlining - binary member via base64 envelope - delta-key skip (both `HandleZSetMeta` entry + `parseZSetMetaKey` guard) - `HandleZSetMetaDelta` explicit entry - `HandleZSetScore` silent-discard - malformed meta length / overflow / MaxInt64 boundary - malformed member-value length / NaN rejection - ±Inf string-form serialization - members-without-meta still emits file - duplicate-members latest-wins (ZADD semantics) ## Caller audit (per `/loop` standing instruction) Semantics-changing edit: new `case redisKindZSet:` branch in `HandleTTL` (redis_string.go:310). Purely additive — the new branch fires only when `zsetState()` has previously registered the key. No existing handler maps to `redisKindZSet`, so no prior call site changes behavior. Verified: ``` grep -n 'redisKindZSet' internal/backup/ # internal/backup/redis_string.go:88 # internal/backup/redis_string.go:310 # internal/backup/redis_zset.go:166 ``` Three references, all new in this PR. No legacy caller assumes the prior behavior. ## Test plan - [x] `go test -race ./internal/backup/` → ok - [x] `golangci-lint run ./internal/backup/...` → 0 issues - [x] `go build ./...` → ok - [x] `go vet ./internal/backup/...` → ok  ## Summary by CodeRabbit * **New Features** * Added Redis sorted-set (ZSET) backup/restore with deterministic JSON output, proper ±Inf score serialization, and legacy-blob reconciliation. * TTLs for ZSETs and sets can be inlined so restored items retain expiry; pending TTLs are drained when a key is first typed. * Orphan-TTL buffering with configurable byte-cap and a fail-closed overflow behavior; option to disable buffering. * **Tests** * Extensive tests covering ZSET semantics, TTL buffering/drain, legacy-blob handling, byte-cap edge cases, and overflow behavior.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/790?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

Merge remote-tracking branch 'origin/feat/backup-phase0a-redis-simple…

4646b9f

…' into feat/backup-phase0a-redis-hash

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

bootjp added 2 commits May 2, 2026 19:41

Merge remote-tracking branch 'origin/feat/backup-phase0a-redis-simple…

5974393

…' into feat/backup-phase0a-redis-hash

bootjp added 2 commits May 2, 2026 20:08

Merge remote-tracking branch 'origin/feat/backup-phase0a-redis-simple…

2f1279d

…' into feat/backup-phase0a-redis-hash

Merge remote-tracking branch 'origin/feat/backup-phase0a-redis-simple…

6abf62f

…' into feat/backup-phase0a-redis-hash

bootjp mentioned this pull request May 2, 2026

backup: Redis simple-type encoders (strings, HLL, TTL routing) #713

Merged

3 tasks

chatgpt-codex-connector Bot reviewed May 2, 2026

View reviewed changes

Base automatically changed from feat/backup-phase0a-redis-simple to main May 2, 2026 14:25

bootjp merged commit a4142fa into main May 2, 2026
5 checks passed

bootjp deleted the feat/backup-phase0a-redis-hash branch May 2, 2026 14:26

claude Bot mentioned this pull request May 16, 2026

backup: Redis set encoder (Phase 0a) #758

Merged

3 tasks

This was referenced May 19, 2026

backup: Redis zset encoder (Phase 0a) #790

Merged

backup: Redis stream encoder (Phase 0a) #791

Merged

Conversation

bootjp commented May 1, 2026

Summary

Self-review (per CLAUDE.md "five lenses")

Test plan

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 2, 2026

Uh oh!

claude Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review — Round 1 fixes (commit cacd4489)

P1 — Array-of-records output shape ✅

High — 32-bit bounds check in parseUserKeyLenPrefix ✅

Medium — kindByKey centralised in hashState ✅

Two observations worth noting for the follow-up

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 2, 2026

Uh oh!

claude Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review — Round 2 fix (commit f44a4e7b)

The change

Correctness walk-through

Test coverage

One minor nit (no action required)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 2, 2026

Uh oh!

claude Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review — Round 3 fix (commit b757dadf)

The problem recapped

HandleHashMeta skip (lines 63-65) ✅

parseHashMetaKey defense-in-depth (lines 133-135) ✅

Test: TestRedisDB_HashHandleHashMetaSkipsDeltaKey ✅

Outstanding observations (from prior rounds)

coderabbitai Bot commented May 1, 2026 •

edited

Loading

claude Bot commented May 2, 2026 •

edited

Loading

Review — Round 1 fixes (commit `cacd4489`)

High — 32-bit bounds check in `parseUserKeyLenPrefix` ✅

Medium — `kindByKey` centralised in `hashState` ✅

claude Bot commented May 2, 2026 •

edited

Loading

Review — Round 2 fix (commit `f44a4e7b`)

claude Bot commented May 2, 2026 •

edited

Loading

Review — Round 3 fix (commit `b757dadf`)

`HandleHashMeta` skip (lines 63-65) ✅

`parseHashMetaKey` defense-in-depth (lines 133-135) ✅

Test: `TestRedisDB_HashHandleHashMetaSkipsDeltaKey` ✅