backup: cmd/elastickv-snapshot-decode CLI + CHECKSUMS writer by bootjp · Pull Request #810 · bootjp/elastickv

bootjp · 2026-05-23T12:20:51Z

Summary

Adds the Phase 0a operator-facing binary on top of the dispatcher in #806. Plus the CHECKSUMS writer / verifier and the <uint64>.fsm filename → snapshot-index parser the CLI needs to stamp MANIFEST.json.

Stacked on backup/snapshot-dispatcher — once #806 merges, this PR's diff collapses to just the CLI + helpers and I'll retarget to main.

What lands

File	Purpose
`cmd/elastickv-snapshot-decode/main.go`	Flag parsing, `DecodeSnapshot` invocation, `MANIFEST.json` + `CHECKSUMS` emission
`cmd/elastickv-snapshot-decode/main_test.go`	End-to-end synthetic-snapshot golden path + four flag-validation paths + `parseAdapterSet` table test
`internal/backup/checksums.go`	`WriteChecksums(root)` / `VerifyChecksums(root)` — sha256sum(1)-compatible
`internal/backup/source.go`	`SnapshotIndexFromPath` — `<uint64>.fsm` filename parser used to stamp `MANIFEST.snapshot_index`
`internal/backup/checksums_test.go`, `source_test.go`	Pin file shape, deterministic ordering, tamper detection, parse table

CLI surface

elastickv-snapshot-decode \
  --input  <fsm-file>          (required)
  --output <directory-root>    (required)
  [--adapter dynamodb,s3,redis,sqs|all]
  [--cluster-id <id>]
  [--include-incomplete-uploads]
  [--include-orphans]
  [--preserve-sqs-visibility]
  [--include-sqs-side-records]
  [--rename-collisions]
  [--dynamodb-bundle-mode per-item|jsonl]

Matches the design's §"Tooling" decoder section, minus the deferred --input-snap-dir, --scope, and --dynamodb-bundle-size flags (called out in the design as later additions).

Output shape

MANIFEST.json carries phase = "phase0-snapshot-decode", the filename-derived snapshot_index, the dispatcher-surfaced last_commit_ts, exclusion flags from the CLI, and a non-nil Adapter{} pointer for every enabled adapter (per-scope enumeration is a follow-up).

CHECKSUMS is the standard sha256sum --binary shape — verifiable with sha256sum -c CHECKSUMS from the dump root, no elastickv binary involved.

Test plan

go test -race ./internal/backup/ ./cmd/elastickv-snapshot-decode/ (passes)
golangci-lint run ./... over the touched packages (clean — 0 issues)
go build ./... (clean)
Golden-path test: synthetic !redis|str|greeting → redis/db_0/strings/greeting.bin + MANIFEST.json + CHECKSUMS + VerifyChecksums round-trip

Self-review (CLAUDE.md §"Self-review of code changes")

Data loss — CHECKSUMS is written last so a crash mid-dump leaves an intact-but-unchecked tree; MkdirAll(outputRoot) is idempotent.
Concurrency — single-threaded; no shared state across goroutines.
Performance — sha256 streams via io.Copy; multi-GB files pass without buffering. CHECKSUMS walk is O(files).
Data consistency — MANIFEST.last_commit_ts comes from the dispatcher header (now surfaced via ReadSnapshotWithHeader so empty-snapshot dumps still populate it); snapshot_index parsed from the filename per the documented live-writer convention; encryption-state failures propagate from DecodeSnapshot as ErrSnapshotEncryptedEntry.
Test coverage — five CLI tests (golden path + four flag-validation paths); seven backup-package tests for the new helpers; existing backup-package tests still pass.

Summary by CodeRabbit

New Features
- New CLI to decode local snapshots into vendor-independent directory trees with adapter selection, output options, and manifest/CHECKSUMS generation
- CHECKSUMS written in sha256sum-compatible format for integrity verification; manifest includes version, cluster metadata and parsed snapshot index when available
Bug Fixes / Security
- CHECKSUMS verification rejects path-traversal and symlink escape attempts
Tests
- Comprehensive tests covering decoding, manifesting, checksums, parsing, and security regressions

coderabbitai · 2026-05-23T12:21:04Z

Warning

Review limit reached

@bootjp, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 19 minutes and 33 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1e2286e-dd6c-493f-929f-decc7f2fcf74

📥 Commits

Reviewing files that changed from the base of the PR and between ec358ab and 2822f2d.

📒 Files selected for processing (2)

internal/backup/checksums.go
internal/backup/checksums_test.go

📝 Walkthrough

Walkthrough

Adds a new CLI command elastickv-snapshot-decode that decodes Pebble .fsm snapshots into per-adapter dump trees, emits a Phase 0a MANIFEST.json (with optional snapshot index), and writes sha256sum-compatible CHECKSUMS; includes snapshot-index parsing and comprehensive tests.

Changes

Snapshot Decode CLI

Layer / File(s)	Summary
SHA-256 checksum implementation `internal/backup/checksums.go`	`WriteChecksums` walks a dump root, computes lowercase SHA-256 for regular files, escapes paths, sorts entries, and writes a sha256sum-compatible `CHECKSUMS`. `VerifyChecksums` streams and validates lines, re-hashes referenced files, and enforces path/symlink safety.
Checksum tests and regressions `internal/backup/checksums_test.go`	Pinned on-disk `CHECKSUMS` formatting, self-exclusion, determinism, tampering detection, traversal/symlink rejection, escape round-trips, streaming scalability, and unit tests for path validation/parsing.
Snapshot index extraction from filesystem paths `internal/backup/source.go`, `internal/backup/source_test.go`	`SnapshotIndexFromPath` extracts a base-10 `uint64` from a `<index>.fsm` basename or returns `ErrSnapshotIndexUnparseable`; tests cover valid and invalid filename forms.
CLI entrypoint and flag parsing `cmd/elastickv-snapshot-decode/main.go` (lines 1–209)	Adds `main` and testable `run`; `parseFlags` validates `--input`/`--output`, parses adapter CSV (including `all`), validates `--dynamodb-bundle-mode`, and exposes sentinel errors for unknown adapters and invalid bundle modes.
Decode orchestration and manifest emission `cmd/elastickv-snapshot-decode/main.go` (lines 210–351)	`decodeOne` opens the FSM, calls `backup.DecodeSnapshot` with CLI-derived options and a warn sink, logs counters and `last_commit_ts`, emits `MANIFEST.json` (version, cluster-id, last commit ts, optional snapshot index, source path, adapter scopes, exclusion flags), and invokes `WriteChecksums`.
CLI integration and unit tests `cmd/elastickv-snapshot-decode/main_test.go`	Golden-path test synthesizes a Redis string snapshot and asserts decoded blob bytes, `MANIFEST.json` fields (phase, snapshot index, cluster id), and `CHECKSUMS` verification. Additional tests cover missing `--input`, unknown adapters, zero-adapter selection, and unparseable `.fsm` filename behavior.

Sequence Diagram

sequenceDiagram
  participant CLI as CLI Entrypoint
  participant Parser as Flag Parser
  participant Decoder as backup.DecodeSnapshot
  participant Manifest as Manifest Emitter
  participant Checksum as WriteChecksums

  CLI->>Parser: parseFlags(argv)
  Parser->>Parser: validate input/output and adapter set
  Parser->>Decoder: run decodeOne with config
  Decoder->>Decoder: open FSM snapshot
  Decoder->>Decoder: decode snapshot bytes
  Decoder->>Manifest: emitManifest (version, cluster-id, timestamp, snapshot_index)
  Manifest->>Manifest: populate adapter scopes and write MANIFEST.json
  Decoder->>Checksum: WriteChecksums(output root)
  Checksum->>Checksum: walk tree, hash files, write CHECKSUMS
  Decoder-->>CLI: return success or error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I dug through bytes of snapshot soil,
Framed keys and values with careful toil,
Manifests stamped and checksums sewn tight,
Per-adapter trees tidy and right,
Phase 0a hops into the light.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: adding a new CLI command (elastickv-snapshot-decode) and a CHECKSUMS writer subsystem for backup operations.
Docstring Coverage	✅ Passed	Docstring coverage is 91.84% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch backup/snapshot-decode-cmd

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces the elastickv-snapshot-decode CLI tool and supporting internal/backup package logic to decode Pebble snapshots into a vendor-independent directory structure. The implementation includes manifest generation, snapshot index extraction, and a sha256 checksum utility. Review feedback identifies a path traversal vulnerability and memory inefficiency in the checksum verification logic, recommends using filepath.Join for cross-platform path handling, and advises against silently ignoring errors from Close() calls.

gemini-code-assist · 2026-05-23T12:23:24Z

+	body, err := os.ReadFile(filepath.Join(root, CHECKSUMSFilename)) //nolint:gosec // operator-supplied output dir
+	if err != nil {
+		return errors.WithStack(err)
+	}
+	lines := strings.Split(strings.TrimRight(string(body), "\n"), "\n")
+	for i, line := range lines {
+		if line == "" {
+			continue
+		}
+		want, rel, ok := splitChecksumLine(line)
+		if !ok {
+			return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i+1, line)
+		}
+		got, err := sha256File(filepath.Join(root, rel))
+		if err != nil {
+			return errors.Wrapf(err, "verifying %s", rel)
+		}
+		if got != want {
+			return errors.Wrapf(ErrChecksumMismatch, "%s: have %s want %s", rel, got, want)
+		}
+	}


This implementation has two significant issues:

Efficiency & Scalability: os.ReadFile followed by strings.Split reads the entire CHECKSUMS file into memory and creates multiple copies of the data. For large dump trees with many files, this can lead to excessive memory consumption or OOM. Using bufio.Scanner to process the file line-by-line is much more efficient.

Security (Path Traversal): The rel path parsed from the checksum file is joined with root without validation. A malicious CHECKSUMS file could contain absolute paths or paths with .. segments, allowing an attacker to trick the tool into reading and hashing arbitrary files on the system. Use filepath.IsLocal(rel) to ensure the path is relative and stays within the intended directory.

Note: You will need to add "bufio" to your imports.

f, err := os.Open(filepath.Join(root, CHECKSUMSFilename)) //nolint:gosec // operator-supplied output dir if err != nil { return errors.WithStack(err) } defer func() { _ = f.Close() }() scanner := bufio.NewScanner(f) for i := 1; scanner.Scan(); i++ { line := scanner.Text() if line == "" { continue } want, rel, ok := splitChecksumLine(line) if !ok { return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i, line) } if !filepath.IsLocal(rel) { return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: path %q is not local", i, rel) } got, err := sha256File(filepath.Join(root, rel)) if err != nil { return errors.Wrapf(err, "verifying %s", rel) } if got != want { return errors.Wrapf(ErrChecksumMismatch, "%s: have %s want %s", rel, got, want) } } if err := scanner.Err(); err != nil { return errors.WithStack(err) }

gemini-code-assist · 2026-05-23T12:23:24Z

+	if cfg.bundleJSONL {
+		m.DynamoDBLayout = backup.DynamoDBLayoutJSONL
+	}
+	out, err := os.Create(cfg.outputRoot + "/MANIFEST.json") //nolint:gosec // operator-supplied path


Use filepath.Join instead of string concatenation for path construction. This ensures cross-platform compatibility and avoids issues with redundant or missing path separators.

Suggested change

out, err := os.Create(cfg.outputRoot + "/MANIFEST.json") //nolint:gosec // operator-supplied path

out, err := os.Create(filepath.Join(cfg.outputRoot, "MANIFEST.json")) //nolint:gosec // operator-supplied path

gemini-code-assist · 2026-05-23T12:23:24Z

+	if err != nil {
+		return errors.WithStack(err)
+	}
+	defer func() { _ = out.Close() }()


Do not silently ignore errors from Close() methods on resources. Log them to ensure resource leaks or other cleanup problems are visible. Since this is a writer, a failure in Close() could indicate a problem flushing final data to disk, even after a successful Sync().

if err := out.Close(); err != nil { log.Errorf("failed to close file: %v", err) }

References

Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c3cd69fa0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T12:25:50Z

+	defer func() { _ = in.Close() }()
+
+	res, err := backup.DecodeSnapshot(in, backup.DecodeOptions{
+		OutRoot:                  cfg.outputRoot,


Set a separate scratch root for snapshot decode

The CLI calls DecodeSnapshot without ScratchRoot, which makes the dispatcher default scratch to OutRoot; in the S3 path that becomes <output>/s3, and S3Encoder.Finalize removes that scratch tree. With default adapters (which include S3), any snapshot containing S3 records can lose the emitted s3/ output before CHECKSUMS is written, producing incomplete backups even though decode otherwise succeeds.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-23T12:25:50Z

+		if !ok {
+			return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i+1, line)
+		}
+		got, err := sha256File(filepath.Join(root, rel))


Reject CHECKSUMS paths that escape the dump root

VerifyChecksums uses filepath.Join(root, rel) on unvalidated paths from CHECKSUMS. A tampered line like ../... is accepted by splitChecksumLine, so verification can read files outside the dump and still report success, which breaks the guarantee that verification only attests to files inside the backup tree.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cmd/elastickv-snapshot-decode/main_test.go (1)
216-216: 💤 Low value

Consider refactoring to remove //nolint:gocritic.

The guideline states to avoid //nolint annotations unless absolutely required and prefer refactoring. The append pattern can be replaced with explicit allocation:
♻️ Proposed refactor
-	encKey := append([]byte(backup.RedisStringPrefix+userKey), make([]byte, snapshotTSSize)...) //nolint:gocritic // explicit append for clarity
+	prefix := backup.RedisStringPrefix + userKey
+	encKey := make([]byte, len(prefix)+snapshotTSSize)
+	copy(encKey, prefix)
As per coding guidelines: "Avoid adding //nolint unless absolutely required and prefer refactoring instead."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-snapshot-decode/main_test.go` at line 216, The test uses append
with a nolint marker when building encKey: replace the append pattern with an
explicit allocation and copy to avoid the //nolint:gocritic; specifically
allocate a byte slice of length
len(backup.RedisStringPrefix)+len(userKey)+snapshotTSSize, copy the prefix
(backup.RedisStringPrefix) and userKey into it, and leave the trailing
snapshotTSSize bytes zeroed so encKey is constructed deterministically without
using append (update the code that sets encKey in the test accordingly).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/backup/checksums.go`:
- Around line 150-156: In VerifyChecksums, validate that the parsed rel path
cannot escape root by calling the existing safeJoinUnderRoot(root, rel) instead
of filepath.Join; on any error from safeJoinUnderRoot return
errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i+1, line) (i.e. map the
helper error to ErrChecksumsMalformedLine), and pass the safe absolute path
returned by safeJoinUnderRoot to sha256File. Also add a unit test for
VerifyChecksums that supplies a CHECKSUMS entry with escaping/absolute rel
values (e.g. "../../etc/passwd" or "/etc/passwd") and assert VerifyChecksums
returns ErrChecksumsMalformedLine.

---

Nitpick comments:
In `@cmd/elastickv-snapshot-decode/main_test.go`:
- Line 216: The test uses append with a nolint marker when building encKey:
replace the append pattern with an explicit allocation and copy to avoid the
//nolint:gocritic; specifically allocate a byte slice of length
len(backup.RedisStringPrefix)+len(userKey)+snapshotTSSize, copy the prefix
(backup.RedisStringPrefix) and userKey into it, and leave the trailing
snapshotTSSize bytes zeroed so encKey is constructed deterministically without
using append (update the code that sets encKey in the test accordingly).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fbee340-dc70-4747-87bd-7da346cf1d01

📥 Commits

Reviewing files that changed from the base of the PR and between ccae199 and 2c3cd69.

📒 Files selected for processing (6)

cmd/elastickv-snapshot-decode/main.go
cmd/elastickv-snapshot-decode/main_test.go
internal/backup/checksums.go
internal/backup/checksums_test.go
internal/backup/source.go
internal/backup/source_test.go

bootjp · 2026-05-23T13:44:52Z

@claude review

claude · 2026-05-23T13:45:06Z

Claude finished @bootjp's task in 3m 13s —— View job

Code Review: PR #810 — `elastickv-snapshot-decode` CLI + CHECKSUMS writer

Todo list

Load tools
Read all 6 changed files
Analyze for security, correctness, performance, concurrency, test coverage
Post review findings

Overall this is clean, well-structured work — flag parsing, manifest emission, and source.go are all solid. There are two issues that need fixing before merge (one P0 data-loss bug, one security issue) plus a few minor items.

P0 — Data loss: S3 output is destroyed when `ScratchRoot` is not set

File: cmd/elastickv-snapshot-decode/main.go:222

DecodeOptions.ScratchRoot is left empty, so it falls back to OutRoot inside newDispatcher. NewS3Encoder(OutRoot, OutRoot) sets s.scratchRoot = filepath.Join(OutRoot, "s3"). At Finalize time (s3.go:539):

defer func() { _ = os.RemoveAll(s.scratchRoot) }()

This removes <OutRoot>/s3/ — the same directory the S3 encoder just wrote all bucket output into. Any dump containing S3 records silently loses every S3 file, but decodeOne returns nil and WriteChecksums happily writes a CHECKSUMS over the now-empty s3/ subtree.

The DecodeOptions godoc even anticipates this: // Defaults to OutRoot when empty; callers backing the CLI's --scratch-root flag override it. The --scratch-root flag is absent from the CLI.

Fix: create an isolated temp dir for scratch before calling DecodeSnapshot and defer its cleanup:

scratch, err := os.MkdirTemp("", "elastickv-snapshot-scratch-*")
if err != nil {
    return errors.Wrap(err, "create scratch dir")
}
defer func() { _ = os.RemoveAll(scratch) }()

res, err := backup.DecodeSnapshot(in, backup.DecodeOptions{
    OutRoot:     cfg.outputRoot,
    ScratchRoot: scratch,
    ...
})

The golden-path test (TestRun_DecodesSimpleRedisStringSnapshot) only uses a Redis entry so it doesn't catch this. A test with an S3 entry and the default adapter set would reproduce it. Fix this →

P1 — Security: path traversal in `VerifyChecksums`

File: internal/backup/checksums.go:154

got, err := sha256File(filepath.Join(root, rel))

rel comes directly from an attacker-controlled CHECKSUMS file with no validation. A line like:

<any-64-hex-chars>  ../../etc/passwd

passes splitChecksumLine and causes sha256File to read (and hash) /etc/passwd. If the attacker also knows what the hash of that file would be, VerifyChecksums returns nil — attesting to a file outside the dump.

filepath.IsLocal (Go 1.20+, available here) rejects both absolute paths and .. segments cleanly:

if !filepath.IsLocal(rel) {
    return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: path %q escapes dump root", i+1, line)
}

A corresponding test is also missing:

func TestVerifyChecksums_RejectsPathTraversal(t *testing.T) {
    root := t.TempDir()
    // Write a CHECKSUMS with an escaping rel path.
    body := strings.Repeat("a", 64) + "  ../../etc/passwd\n"
    os.WriteFile(filepath.Join(root, CHECKSUMSFilename), []byte(body), 0o600)
    err := VerifyChecksums(root)
    if !errors.Is(err, ErrChecksumsMalformedLine) {
        t.Fatalf("want ErrChecksumsMalformedLine, got %v", err)
    }
}

Note: the safeJoinUnderRoot helper in s3.go (same package) is designed for user-controlled S3 keys and refuses backslashes, NUL bytes, and dot segments — it's too strict for this use case and coupled to ErrS3MalformedKey. filepath.IsLocal is the right tool here. Fix this →

P2 — Minor: `filepath.Join` for MANIFEST path

File: cmd/elastickv-snapshot-decode/main.go:281

out, err := os.Create(cfg.outputRoot + "/MANIFEST.json")

Should be filepath.Join(cfg.outputRoot, "MANIFEST.json") — matches the style used by WriteChecksums one line below and handles any trailing-separator edge cases.

P2 — Minor: silent `Close` error on MANIFEST writer

File: cmd/elastickv-snapshot-decode/main.go:285

defer func() { _ = out.Close() }()

The out.Sync() on line 289 guards against most flush failures, but Close errors are still silently discarded. Since this is a CLI tool (not a hot path), logging the error on defer is the right call:

defer func() {
    if cerr := out.Close(); cerr != nil {
        logger.Warn("close MANIFEST.json", "err", cerr)
    }
}()

(The same pattern appears in WriteChecksums:63 — consistency would help here too.)

P2 — Minor: `os.ReadFile` in `VerifyChecksums` reads entire file into memory

File: internal/backup/checksums.go:141

For large dump trees with thousands of files, os.ReadFile + strings.Split loads the full CHECKSUMS content into memory and creates multiple allocations. A bufio.Scanner approach processes it line-by-line in O(1) memory (as noted by Gemini's review). This matters most when CHECKSUMS is > a few MB, which happens at ~10k+ files.

P3 — Style: `//nolint:gocritic` in test

File: cmd/elastickv-snapshot-decode/main_test.go:216

encKey := append([]byte(backup.RedisStringPrefix+userKey), make([]byte, snapshotTSSize)...) //nolint:gocritic // explicit append for clarity

The project guideline says "Avoid //nolint — refactor instead". CodeRabbit's suggested refactor is straightforward:

prefix := backup.RedisStringPrefix + userKey
encKey := make([]byte, len(prefix)+snapshotTSSize)
copy(encKey, prefix)

Summary

Severity	Issue	File:Line
P0	S3 scratch root = OutRoot → `Finalize` deletes S3 output	`main.go:222`
P1	`VerifyChecksums` reads files outside dump root via path traversal	`checksums.go:154`
P2	String concat for MANIFEST path instead of `filepath.Join`	`main.go:281`
P2	Silent `Close` error on MANIFEST writer	`main.go:285`
P2	`os.ReadFile` memory inefficiency in `VerifyChecksums`	`checksums.go:141`
P3	`//nolint:gocritic` in test, per project guideline	`main_test.go:216`

The P0 and P1 issues are the priority — both have straightforward fixes and the P0 will silently corrupt any dump that contains S3 data. Everything else (source.go, parseFlags, parseAdapterSet, manifest structure) looks correct.

Three reviewer-flagged issues on PR #810: 1. (codex P1, data loss) The dispatcher defaulted ScratchRoot to OutRoot when blank. S3Encoder.Finalize then ran RemoveAll(scratchRoot+"/s3") and wiped the emitted S3 dump subtree before CHECKSUMS could be written. Fix: newDispatcher.ensureS3Scratch now allocates a fresh os.MkdirTemp when the caller leaves ScratchRoot blank, parents it on the dispatcher, and removes it in finalize via a deferred cleanupScratch. Caller-supplied ScratchRoot is unchanged. Test: TestDecodeSnapshot_S3OutputSurvivesDefaultScratch — feeds a synthetic bucket-meta + manifest + blob through DecodeSnapshot with ScratchRoot blank and asserts the assembled S3 body is on disk under <OutRoot>/s3/<bucket>/<obj>. With the bug the file is wiped and the read fails. 2. (coderabbit critical, path traversal) VerifyChecksums joined CHECKSUMS relative paths to root without validating them, so a malicious CHECKSUMS line like `<hex> ../../etc/shadow` would direct the verifier to fingerprint files outside the dump root. Fix: validateChecksumRelPath rejects absolute paths and any path whose filepath.Clean'd form starts with `..` or equals `..`/`.`/empty. Returns ErrChecksumsPathTraversal so callers can branch on errors.Is. Test: TestVerifyChecksums_RejectsTraversalPath — table-driven over five attack shapes (parent-traversal, deep-traversal, absolute, current-dir, empty); each is rejected before any sha256 is computed. 3. (gemini security-high + high, OOM vector) VerifyChecksums used os.ReadFile + strings.Split, materialising the whole CHECKSUMS in memory. A multi-million-file dump or an adversarial newline-free CHECKSUMS could OOM the verifier. Fix: switched to bufio.Scanner with a fixed 8 KiB per-line budget. Streaming throughout — even with a TB of dump artefacts the verifier holds at most one line plus the in-flight sha256 hash state. Test: TestVerifyChecksums_StreamsLargeFile exercises the scanner path with 64 files (a realistic file count for the per-line-scan code path; the cost of a literal-large fixture in CI is not justified vs. the assertion that the streaming API works at all). Caller audit (CLAUDE.md "semantic-change → audit all callers"): - VerifyChecksums: only internal/test callers; no production call sites in the CLI yet. Path-validation is purely a rejection-side tightening; honest CHECKSUMS files are unaffected. Streaming semantics are unobservable to callers. - DecodeSnapshot / newDispatcher: one external caller (cmd/elastickv-snapshot-decode/main.go:222), which leaves ScratchRoot blank and now gets the safe tempdir behavior. No signature change. Self-review: 1. Data loss — fixed (the S3 wipe was data-loss; tempdir isolation prevents it). New temp leaves on hard crash but live system temp eviction handles that. 2. Concurrency — no shared state introduced. 3. Performance — streaming verify is strictly better. 4. Data consistency — path traversal closed; encryption-state handling unchanged. 5. Test coverage — three regression tests, one per finding.

bootjp · 2026-05-23T14:18:18Z

@claude review

Addressed three reviewer findings in 34970f0:

Codex P1 (data loss): dispatcher defaulted ScratchRoot to OutRoot; S3Encoder.Finalize then RemoveAll'd <OutRoot>/s3 and wiped the S3 dump before CHECKSUMS could be written. Fix: newDispatcher.ensureS3Scratch allocates a fresh os.MkdirTemp when blank, parents it on the dispatcher, removes it via deferred cleanupScratch in finalize. Caller-supplied ScratchRoot is unchanged. New regression test TestDecodeSnapshot_S3OutputSurvivesDefaultScratch synthesises a bucket-meta + manifest + blob and asserts the assembled body survives.
CodeRabbit Critical (path traversal): VerifyChecksums joined CHECKSUMS relative paths to root without validating them, so a hostile CHECKSUMS could direct the verifier outside root. Fix: validateChecksumRelPath rejects absolute paths and any cleaned form starting with ... Returns ErrChecksumsPathTraversal. New table test TestVerifyChecksums_RejectsTraversalPath covers five attack shapes.
Gemini security-high + high (OOM vector): switched VerifyChecksums from os.ReadFile + strings.Split to bufio.Scanner with an 8 KiB per-line budget so multi-million-file dumps and adversarial newline-free CHECKSUMS cannot force the verifier to buffer arbitrarily. New test TestVerifyChecksums_StreamsLargeFile exercises the scanner path.

Caller audit per CLAUDE.md (semantic-change → grep all callers):

VerifyChecksums: only internal/test callers; no production call site yet. Path-validation is rejection-side tightening; honest CHECKSUMS unaffected. Streaming is unobservable.
DecodeSnapshot / newDispatcher: one external caller (cmd/elastickv-snapshot-decode/main.go:222), which leaves ScratchRoot blank and now gets the safe tempdir. No signature change.

claude · 2026-05-23T14:18:30Z

Claude finished @bootjp's task in 2m 18s —— View job

Code Review: PR #810 — fixes in `34970f00`

Todo list

Load tools
Read changed files (checksums.go, decode.go, tests)
Verify P1 fix: S3 scratch root isolation
Verify P1 fix: path traversal prevention in VerifyChecksums
Verify P2 fix: bufio.Scanner OOM fix
Check for any remaining issues or regressions
Post final review

The three targeted fixes are all correct. Here's the per-fix analysis, plus a note on what's still open.

Fix 1 — S3 scratch root isolation (Codex P1) ✅

decode.go:259-268 / decode.go:307-312

ensureS3Scratch correctly allocates a fresh os.MkdirTemp when ScratchRoot is blank and parents it on the dispatcher. The order-of-operations in finalize is correct: defer d.cleanupScratch() is registered first, then all adapter Finalize() calls run, so by the time cleanupScratch fires, S3Encoder.Finalize has already run and removed <tmp>/s3/. The subsequent RemoveAll(<tmp>) in cleanupScratch is a harmless no-op on an already-empty dir. One detail to note: s3.go:313 sets s.scratchRoot = filepath.Join(scratchRoot, "s3"), so S3's Finalize removes <tmp>/s3/, not <tmp> itself — so cleanupScratch still has a real job to do (removing the parent temp dir). This is fine.

The regression test TestDecodeSnapshot_S3OutputSurvivesDefaultScratch correctly synthesises a complete bucket-meta + object-manifest + blob sequence and asserts the assembled object body survives at s3/<bucket>/<object> after DecodeSnapshot returns.

Fix 2 — Path traversal in VerifyChecksums (CodeRabbit Critical) ✅

checksums.go:228-252

validateChecksumRelPath is thorough:

Empty and . are caught upfront as ErrChecksumsMalformedLine
Absolute paths caught by filepath.IsAbs(rel) before Clean
filepath.Clean(rel) canonicalises the path, then cleaned == ".." || strings.HasPrefix(cleaned, "../") catches a/../../b → ../b style traversals
Belt-and-braces filepath.IsAbs(cleaned) catches platforms where IsAbs and Clean disagree

The test TestVerifyChecksums_RejectsTraversalPath covers the five important shapes. The dual-error assertion errors.Is(err, ErrChecksumsPathTraversal) || errors.Is(err, ErrChecksumsMalformedLine) is intentional and correct — . and "" surface as malformed (wrong shape), while traversal paths surface as ErrChecksumsPathTraversal. Both signal "do not trust this CHECKSUMS file".

One minor note: the test body for the "empty" case writes "0000... \n" (two spaces then nothing). splitChecksumLine extracts rest[2:] = "" and returns ("0000...", "", true). Then validateChecksumRelPath("") returns ErrChecksumsMalformedLine. This is the right behaviour but the test comment could note that empty relPath goes through the shape-parse-succeeds-but-path-rejects path, not splitChecksumLine returning false. Not a bug.

Fix 3 — bufio.Scanner OOM prevention (Gemini) ✅

checksums.go:26 / checksums.go:172-184

maxChecksumLineLen = 8 * 1024 is well-justified — even a 30-level-deep path at NAME_MAX=255 fits in 8 KiB. sc.Buffer(make([]byte, 0, maxChecksumLineLen), maxChecksumLineLen) correctly sets both the initial slice and the max size, so a corrupt CHECKSUMS without newlines causes bufio.ErrTooLong rather than unbounded allocation.

The refactoring into verifyOneChecksumLine is a clean separation that also helps with the cyclomatic-complexity linter constraint called out in the comment.

Still open from the previous round

These three items were not in scope for 34970f00 and remain as-is in main.go / main_test.go:

Severity	Issue	Location
P2	`cfg.outputRoot + "/MANIFEST.json"` — string concat instead of `filepath.Join`	`main.go:281`
P2	Silent `Close` error on MANIFEST writer	`main.go:285`
P3	`//nolint:gocritic` in test helper	`main_test.go:216`

These are low-risk (P2 string concat is cosmetic on POSIX; P2 silent Close is guarded by out.Sync() two lines above; P3 nolint violates project guidelines). Worth cleaning up before merge but not blockers if stacked PRs are landing fast.

Summary: All three targeted fixes (S3 data loss, path traversal, OOM vector) are correctly implemented and covered by regression tests. The three items above remain open from the previous review round.

Adds the Phase 0a operator-facing binary on top of the dispatcher landed in PR #806. The CLI reads a .fsm snapshot, dispatches it through DecodeSnapshot, and emits MANIFEST.json + CHECKSUMS into the chosen output root. - cmd/elastickv-snapshot-decode/main.go - flag surface: --input, --output, --adapter, --cluster-id, --include-incomplete-uploads, --include-orphans, --preserve-sqs-visibility, --include-sqs-side-records, --rename-collisions, --dynamodb-bundle-mode - testable run() entrypoint (argv slice in, error out; no os.Exit) so main_test.go can drive the binary directly - structured slog warning sink wired into the encoders - MANIFEST.json stamped with snapshot_index parsed from filename, last_commit_ts from the dispatcher header, exclusion flags from CLI, and a non-nil pointer to Adapter{} for every enabled adapter so the on-disk shape is already correct when per-scope enumeration lands - cmd/elastickv-snapshot-decode/main_test.go - end-to-end synthetic Redis-string snapshot round-trips through run(), produces the expected blob file, manifest, and CHECKSUMS, and passes VerifyChecksums - flag-validation paths: missing --input, unknown adapter, zero adapter list, unparseable .fsm filename - parseAdapterSet table test - internal/backup/checksums.go - WriteChecksums(root): walks root, computes sha256, writes a sha256sum(1)-compatible CHECKSUMS file (two-space separator, forward-slash paths, lowercase hex, lexicographic order, excludes CHECKSUMS itself) - VerifyChecksums(root): re-reads the file and surfaces the first mismatch as ErrChecksumMismatch (typed sentinel for the Phase 0b encoder self-test path) - internal/backup/source.go - SnapshotIndexFromPath: parses the <uint64>.fsm filename convention the live writer emits and the CLI stamps into MANIFEST.snapshot_index; soft-fails for off-convention names via ErrSnapshotIndexUnparseable - internal/backup/{checksums,source}_test.go - basic listing, self-exclusion, determinism across runs, happy-path verify, tamper detection - SnapshotIndexFromPath table test covering live-writer layout, hand-rolled names, max-uint64, missing suffix, empty stem, non-numeric, negative, hex stems Self-review: 1. Data loss: no — CHECKSUMS is written last so a crash mid-dump leaves an unchecked but otherwise intact tree the operator can re-checksum; the MkdirAll on outputRoot is idempotent so an operator-supplied pre-existing directory is preserved. 2. Concurrency: single-threaded; main is sequential. 3. Performance: sha256 over each output file streams via io.Copy so multi-GB dump artifacts pass without buffering. CHECKSUMS walk is O(files). 4. Data consistency: MANIFEST.last_commit_ts comes from the dispatcher header (now surfaced via ReadSnapshotWithHeader so empty-snapshot dumps still populate it); snapshot_index parsed from the filename per the documented live-writer convention; encryption-state failures propagate from DecodeSnapshot as ErrSnapshotEncryptedEntry. 5. Test coverage: five CLI tests (golden path + four flag- validation surfaces); seven backup-package tests covering the new helpers; existing backup-package tests still pass.

…ning) Two reviewer-flagged issues on the CHECKSUMS path. The Codex P1 ("default ScratchRoot aliased OutRoot") was already fixed on main by PR #806 follow-ups (c9f23c7 + b9fa4a7 — default to <OutRoot>/.scratch + reject ScratchRoot==OutRoot), so this commit ships only the remaining two findings. 1. (coderabbit critical, path traversal) VerifyChecksums joined CHECKSUMS relative paths to root without validating them, so a malicious CHECKSUMS line like `<hex> ../../etc/shadow` would direct the verifier to fingerprint files outside the dump root. Fix: validateChecksumRelPath rejects absolute paths and any path whose filepath.Clean'd form starts with `..` or equals `..`/`.`/empty. Returns ErrChecksumsPathTraversal so callers can branch on errors.Is. Test: TestVerifyChecksums_RejectsTraversalPath — table-driven over five attack shapes (parent-traversal, deep-traversal, absolute, current-dir, empty); each is rejected before any sha256 is computed. 2. (gemini security-high + high, OOM vector) VerifyChecksums used os.ReadFile + strings.Split, materialising the whole CHECKSUMS in memory. A multi-million-file dump or an adversarial newline-free CHECKSUMS could OOM the verifier. Fix: switched to bufio.Scanner with a fixed 8 KiB per-line budget. Streaming throughout — even with a TB of dump artefacts the verifier holds at most one line plus the in-flight sha256 hash state. Test: TestVerifyChecksums_StreamsLargeFile exercises the scanner path with 64 files; CI cost of a literal-large fixture is not justified vs. the assertion that the streaming API works at all. Caller audit (CLAUDE.md "semantic-change → grep all callers"): - VerifyChecksums: zero production callers. Test-only consumers. Path-validation is rejection-side tightening; honest CHECKSUMS unaffected. Streaming semantics unobservable to callers. - WriteChecksums (unchanged in this commit) has one caller (cmd/elastickv-snapshot-decode/main.go:252) — still passes through. Self-review: 1. Data loss — no regression (S3 scratch fix already on main). 2. Concurrency — none. 3. Performance — streaming verify is strictly better. 4. Data consistency — path traversal closed. 5. Test coverage — two regression tests, one per finding.

chatgpt-codex-connector

💡 Codex Review

elastickv/internal/backup/decode.go

Lines 150 to 151 in 34970f0

    
           if err != nil { 
        
           	return DecodeResult{}, err

Clean up owned scratch dir on snapshot read failures

When ReadSnapshotWithHeader returns an error, DecodeSnapshot exits before calling d.finalize(), but cleanupScratch is only deferred inside finalize; with default S3 scratch allocation this leaves scratchOwned (and any partially written chunk files) behind in /tmp. Repeated failed decodes can accumulate large temp data and leak decoded remnants, because the new temp-scratch lifecycle is not executed on the error path.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:22:00Z

+	if err != nil {
+		return errors.Wrapf(err, "line %d", lineNum)
+	}
+	got, err := sha256File(filepath.Join(root, cleaned))


Refuse symlink paths during checksum verification

VerifyChecksums validates .. traversal in the textual path, but then hashes filepath.Join(root, cleaned) via os.Open, which follows symlinks; an untrusted dump can include CHECKSUMS entry safe/file where safe/file is a symlink to a host path outside the backup root, and verification will read/hash that external file. This breaks the intended guarantee that verification is confined to the dump tree and can produce false trust decisions based on external filesystem state.

Useful? React with 👍 / 👎.

bootjp · 2026-05-23T14:22:18Z

@claude review

r1 (force-pushed b60f468 after rebase onto main):

Codex P1 dropped — already fixed on main by PR backup: snapshot dispatcher (Phase 0a routing layer) #806 follow-ups (c9f23c75 defaults ScratchRoot to <OutRoot>/.scratch; b9fa4a7a rejects ScratchRoot==OutRoot cleaned-equal). My initial r1 tempdir approach was redundant; reverted and rebased.
CodeRabbit Critical (path traversal): VerifyChecksums now path-validates every CHECKSUMS line via validateChecksumRelPath — rejects absolute paths and any cleaned form starting with ... Returns ErrChecksumsPathTraversal. New table test covers parent-traversal, deep-traversal, absolute, current-dir, empty.
Gemini security-high + high (OOM): switched VerifyChecksums from os.ReadFile + strings.Split to bufio.Scanner with an 8 KiB per-line budget. New test TestVerifyChecksums_StreamsLargeFile exercises the scanner path.

Caller audit: VerifyChecksums has no production callers (test-only); WriteChecksums has one caller (CLI), unchanged in this commit.

claude · 2026-05-23T14:22:29Z

Claude finished @bootjp's task in 3m 0s —— View job

Code Review: PR #810 — r1 (`b60f468`)

Todo list

Load tools
Read current state of all changed files + decode.go
Verify path traversal fix (validateChecksumRelPath)
Verify OOM fix (bufio.Scanner)
Check remaining open items from previous round
Check new Codex issues (scratch dir leak, symlink bypass)
Post review findings

Verified fixes ✅

Codex P1 (ScratchRoot) — correctly dropped. decode.go:199-208 defaults ScratchRoot to OutRoot/.scratch (never OutRoot itself), and decode.go:216-220 hard-rejects ScratchRoot == OutRoot cleaned-equal as ErrDecodeOptionsInvalid. The CLI at main.go:222 leaves ScratchRoot empty and gets the safe default. ✓

CodeRabbit Critical (path traversal) — correctly fixed. validateChecksumRelPath (checksums.go:228-252) catches all four attack shapes in the right order: empty/. → ErrChecksumsMalformedLine; filepath.IsAbs(rel) before Clean; .. or ../ prefix after Clean; belt-and-braces filepath.IsAbs(cleaned) for the edge case where Clean and IsAbs disagree on platform. The five-case table test in checksums_test.go:125-159 pins each arm. ✓

Gemini OOM — correctly fixed. bufio.Scanner with maxChecksumLineLen = 8 KiB at checksums.go:172-184 bounds per-line memory regardless of file size. The 8 KiB cap is justified by the comment (a 30-deep tree of NAME_MAX=255 paths stays under 8 KiB). sc.Buffer call correctly sets both the initial capacity and the hard max. The streaming test at checksums_test.go:167-180 exercises this path. ✓

Codex P2 "scratch dir leak" — incorrect finding

The Codex comment on decode.go:150-151 claims that finalize() is not called when ReadSnapshotWithHeader returns an error. This is wrong. Looking at decode.go:163:

fErr := d.finalize()   // called unconditionally, AFTER ReadSnapshotWithHeader
if err == nil {
    err = fErr
}

The code comment even says "finalize() runs unconditionally so encoders (notably the S3 encoder's scratch tree) get cleanup even when the stream is truncated." The scratch dir (OutRoot/.scratch) is cleaned up on every exit path. This Codex finding can be disregarded.

P1 — Symlink bypass in `VerifyChecksums` (new, Codex `checksums.go:204`)

The Codex comment is a real finding. validateChecksumRelPath guards against textual .. traversal, but sha256File(filepath.Join(root, cleaned)) calls os.Open which follows symlinks. A hostile dump package containing a file safe/link that is a symlink to /etc/shadow — with a matching CHECKSUMS line <sha256-of-shadow> safe/link — would pass verification by fingerprinting the external file.

WriteChecksums is safe (it skips non-regular files at checksums.go:102), but VerifyChecksums doesn't enforce the same constraint. The fix is one os.Lstat call in verifyOneChecksumLine before sha256File:

fi, err := os.Lstat(filepath.Join(root, cleaned))
if err != nil {
    return errors.Wrapf(err, "stat %s", cleaned)
}
if fi.Mode()&os.ModeSymlink != 0 {
    return errors.Wrapf(ErrChecksumsMalformedLine, "%s: is a symlink, dump tree should not contain symlinks", cleaned)
}

Threat-model note: exploiting this requires the attacker to control both the dump tree (plant the symlink) and the CHECKSUMS (list it with the correct digest). A dump produced by a stock DecodeSnapshot invocation will never contain symlinks. The risk is real primarily when verifying dumps from untrusted or third-party sources. A test should accompany the fix:

func TestVerifyChecksums_RejectsSymlink(t *testing.T) {
    root := t.TempDir()
    target := t.TempDir()
    mustWrite(t, filepath.Join(target, "secret"), []byte("secret"))
    os.Symlink(filepath.Join(target, "secret"), filepath.Join(root, "link"))
    // Write a CHECKSUMS that lists the symlink with its real content hash.
    h, _ := sha256File(filepath.Join(target, "secret"))
    body := h + "  link\n"
    mustWrite(t, filepath.Join(root, CHECKSUMSFilename), []byte(body))
    if err := VerifyChecksums(root); err == nil {
        t.Fatal("expected error for symlink entry, got nil")
    }
}

Fix this →

Still open from previous round

Severity	Issue	Location
P2	`cfg.outputRoot + "/MANIFEST.json"` — string concat instead of `filepath.Join`	`main.go:281`
P2	`defer func() { _ = out.Close() }()` — silent Close error; `Sync()` doesn't fully cover it	`main.go:285`
P3	`//nolint:gocritic` in test violates "Avoid //nolint — refactor instead"	`main_test.go:216`

The P3 nolint refactor (allocate + copy instead of append) and P2 items are all low-risk; the symlink bypass is the only new substantive finding this round.

Codex r2 P1: after the r1 textual `..`-traversal guard, VerifyChecksums still followed symlinks when sha256-ing the joined path. A CHECKSUMS line `<hex> safe/file` could direct the verifier to hash a host file outside the dump root via a planted symlink at `safe/file` (or any intermediate parent). Fix: refuseSymlinkComponents walks the relative path segment by segment from root, os.Lstat()s each component, and returns ErrChecksumsSymlinkEscape the moment any component is a symlink. Catches BOTH terminal symlinks AND intermediate ones (a parent directory along the way being symlinked). The per-component walk is stricter than syscall O_NOFOLLOW (which only covers the terminal case) and avoids the filepath.EvalSymlinks resolution we are trying to refuse. Phase 0a dumps contain only regular files by design, so refusing every symlink found in the path is the right policy for the only honest dump shape. Tests: - TestVerifyChecksums_RejectsTerminalSymlink — plant a symlink at safe/escape pointing at a host file, add a CHECKSUMS line for it; assert ErrChecksumsSymlinkEscape. - TestVerifyChecksums_RejectsIntermediateSymlink — plant a symlink at safe/ (a directory link to an outside dir), reference safe/target.txt in CHECKSUMS; same assertion. - Both tests skip on Windows (os.Symlink requires SeCreateSymbolicLinkPrivilege there; the lstat-walk fix itself runs on every platform). Caller audit (CLAUDE.md "semantic-change → grep all callers"): - refuseSymlinkComponents is only called from verifyOneChecksumLine, which is only called from VerifyChecksums. VerifyChecksums has no production callers (test-only); CLI uses WriteChecksums. Self-review: 1. Data loss — none (read-side hardening). 2. Concurrency — none. 3. Performance — one extra Lstat per path component during verify; verify is offline-only and rare. 4. Data consistency — symlink-escape closed. 5. Test coverage — two regression tests (terminal + intermediate symlink); existing five-shape traversal-path table test still passes.

bootjp · 2026-05-23T14:26:54Z

@claude review

Addressed codex r2 P1 (symlink escape) in 2bb8cd7:

VerifyChecksums symlink escape: after the r1 textual .. guard, os.Open(filepath.Join(root, cleaned)) would still follow a planted symlink at the terminal or any intermediate component. Fix: refuseSymlinkComponents walks the relative path segment-by-segment, os.Lstat's each, and returns ErrChecksumsSymlinkEscape on the first symlink found. Catches both terminal AND intermediate symlinks (stricter than O_NOFOLLOW, which only covers terminal).
New regression tests: TestVerifyChecksums_RejectsTerminalSymlink and TestVerifyChecksums_RejectsIntermediateSymlink. Both skip on Windows (requires SeCreateSymbolicLinkPrivilege); the fix itself runs on every platform.

Caller audit: refuseSymlinkComponents is only called from verifyOneChecksumLine → VerifyChecksums; the latter has no production callers (test-only). CLI uses WriteChecksums, unchanged.

claude · 2026-05-23T14:27:06Z

Claude finished @bootjp's task in 1m 27s —— View job

Code Review: PR #810 — r2 (`2bb8cd7f`, symlink escape fix)

Todo list

Load tools
Read checksums.go and checksums_test.go (current state)
Read main.go (check remaining open items)
Verify symlink fix correctness (refuseSymlinkComponents)
Check remaining open items from prior rounds
Post review findings

Symlink fix — `refuseSymlinkComponents` ✅

checksums.go:236-253

The implementation is correct and stricter than the alternatives:

Splits rel on string(filepath.Separator) after validateChecksumRelPath has already run filepath.Clean, so separators are OS-normalised before the split. On Linux and macOS the CHECKSUMS forward-slash paths round-trip correctly; on Windows filepath.Clean converts them to backslashes, and the split follows. ✓
os.Lstat at each path component (not os.Stat) means the check never resolves the symlink itself — it examines the symlink node. ✓
Per-component walk catches intermediate symlinks (a directory component that is a symlink pointing outside root), which O_NOFOLLOW or a single os.Lstat(joined) at the end would miss. The comment in the code correctly explains why. ✓
seg == "" skip is belt-and-braces and harmless — filepath.Clean would not produce consecutive separators. ✓

checksums_test.go:162-245

TestVerifyChecksums_RejectsTerminalSymlink calls WriteChecksums before planting the symlink (so CHECKSUMS contains only the honest files), then appends a hand-crafted entry for the symlinked path. refuseSymlinkComponents fires before the hash comparison, so the hardcoded digest in the appended line doesn't need to be correct — the test is not testing hash matching, it's testing symlink rejection. The approach is sound. ✓

TestVerifyChecksums_RejectsIntermediateSymlink uses an all-zeros digest (same rationale) for a path through a directory that is a symlink. Also correct. ✓

Both tests skip on Windows with an appropriate reason. ✓

Still open from previous rounds

These three items were explicitly deferred by the author and remain unchanged:

Severity	Issue	Location
P2	`cfg.outputRoot + "/MANIFEST.json"` — string concat instead of `filepath.Join`	`main.go:281`
P2	`defer func() { _ = out.Close() }()` — silent Close error on MANIFEST writer	`main.go:285`
P3	`//nolint:gocritic` in test helper	`main_test.go:216`

The P2 silent-Close pattern also appears in WriteChecksums at checksums.go:75 (same defer func() { _ = out.Close() }(), though that one is protected by out.Sync() on line 81 too). Neither is a blocker given the Sync() guard, but they're worth noting for consistency.

Overall

The symlink escape fix is correctly implemented and well-tested. The three remaining items are all low-risk and non-blocking — the security-relevant findings (path traversal, symlink escape, OOM) have all been resolved. This is ready to merge once the stacking dependency (#806) lands.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bb8cd7fb0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:31:57Z

+	if filepath.IsAbs(rel) {
+		return "", errors.Wrapf(ErrChecksumsPathTraversal, "absolute path: %q", rel)
+	}
+	cleaned := filepath.Clean(rel)


Use filepath.IsLocal for CHECKSUMS path validation

validateChecksumRelPath only rejects absolute and ..-prefixed paths, but it still accepts non-local Windows names like CON, CONOUT$, or COM1. During VerifyChecksums, those names are opened via os.Open(filepath.Join(root, cleaned)), which can access device paths outside the backup tree instead of regular files, so verification can succeed/fail based on host device state rather than dump contents. Replace this guard with a locality check (for example filepath.IsLocal on the original/cleaned path) to enforce the intended “confined to dump root” guarantee across platforms.

Useful? React with 👍 / 👎.

…sLocal) Codex r3 P1: validateChecksumRelPath rejected `..`-traversal and absolute paths but accepted Windows reserved device names like CON / NUL / COM1 / CONOUT$. On Windows, os.Open of those names yields the host's console / device rather than a regular file under the dump root, so verification can succeed/fail based on host device state instead of dump contents. Fix: replace the explicit absolute/traversal checks with filepath.IsLocal (Go 1.20+), which delegates path-locality to the stdlib. On Windows IsLocal additionally rejects the reserved device names; on Unix the names look like ordinary files (and would be hashed if they exist under root, the honest dump shape). Empty / `.` are still explicitly rejected because IsLocal considers `.` local (it stays within the subtree, but it names the root directory, which is not a hashable regular file). The post-Clean recheck below IsLocal is belt-and-braces against a future stdlib change. Tests: - TestValidateChecksumRelPath_RejectsWindowsDeviceNames pins the rejection. The reserved-name check is Windows-only by filepath.IsLocal's contract, so the test skips on non-Windows; the lexical fix itself runs on every platform. - TestValidateChecksumRelPath_AcceptsHonestPaths cross-checks the IsLocal switchover does not regress the five legitimate Phase 0a dump-tree shapes (Redis blob, S3 nested object, DynamoDB schema, SQS messages.jsonl). Caller audit (CLAUDE.md "semantic-change → grep all callers"): - validateChecksumRelPath only called from verifyOneChecksumLine → VerifyChecksums; the latter has no production callers (test-only). The semantic change is purely rejection-side tightening on Windows; honest CHECKSUMS unaffected. Self-review: 1. Data loss — none (read-side hardening). 2. Concurrency — none. 3. Performance — IsLocal is O(path-len); a single pass. 4. Data consistency — Windows device-name escape closed. 5. Test coverage — two new tests; existing five-shape traversal-path table test still passes.

bootjp · 2026-05-23T14:36:37Z

@claude review

Addressed codex r3 P1 in b05ec87:

VerifyChecksums Windows reserved names: validateChecksumRelPath previously rejected ..-traversal and absolute paths but accepted CON / NUL / COM1 / CONOUT$. Fix: replaced the explicit checks with filepath.IsLocal (Go 1.20+), which delegates path-locality to the stdlib. On Windows IsLocal additionally rejects reserved device names; on Unix those strings are ordinary filenames and would only be hashed if they exist under root (the honest dump shape).
Empty / . still rejected explicitly; the post-Clean recheck below IsLocal is belt-and-braces against a future stdlib loosening.
New tests: TestValidateChecksumRelPath_RejectsWindowsDeviceNames (skips on non-Windows since the reserved-name check is platform-specific per IsLocal's contract) and TestValidateChecksumRelPath_AcceptsHonestPaths (cross-checks that the IsLocal switchover does not regress the five legitimate Phase 0a dump-tree shapes).

Caller audit: validateChecksumRelPath is only called from verifyOneChecksumLine → VerifyChecksums; the latter has no production callers (test-only).

claude · 2026-05-23T14:36:49Z

Claude finished @bootjp's task in 1m 12s —— View job

Code Review: PR #810 — r3 (`b05ec878`)

Todo list

Load tools
Read checksums.go and checksums_test.go (current state)
Verify r3 fix: filepath.IsLocal replacement in validateChecksumRelPath
Check remaining open items from prior rounds
Post final review

`filepath.IsLocal` fix — `validateChecksumRelPath` ✅

checksums.go:286-306

The fix is correct and well-structured. Reviewing the call chain:

Empty/. caught explicitly upfront — both surface as ErrChecksumsMalformedLine. Correct: filepath.IsLocal(".") returns true (it is local), so the explicit pre-check is necessary to prevent hashing a root directory. ✓
filepath.IsLocal(rel) called on the raw pre-Clean form. This is the right order — Clean could theoretically normalise a dodgy path that IsLocal would catch, so calling on the original is strictly safer. ✓
filepath.Clean(rel) applied after the IsLocal guard to produce the canonical form returned to the caller. ✓
Belt-and-braces ..-prefix and filepath.IsAbs re-checks after Clean — documented as guards against a future IsLocal stdlib loosening. Harmless and the rationale is sound. ✓

Forward-slash interop on Windows: CHECKSUMS entries are written with forward-slash separators (filepath.ToSlash at checksums.go:119). On Windows, filepath.IsLocal accepts forward slashes and correctly converts before reserved-name checks — this is part of the stdlib contract since Go 1.20. ✓

New tests ✅

TestValidateChecksumRelPath_RejectsWindowsDeviceNames (checksums_test.go:261)

Windows-only skip is correct — filepath.IsLocal only rejects reserved names on Windows; on Unix CON/NUL/etc. are ordinary filenames that would only fail downstream when the file doesn't exist.
Tests validateChecksumRelPath directly (not VerifyChecksums) — the test comment explains why: the os.Lstat step downstream would surface ENOENT on Unix, masking the ErrChecksumsPathTraversal rejection class being pinned. ✓
The seven names cover both CON/NUL/PRN/AUX (device streams), COM1/LPT1 (serial/parallel), and CONOUT$ (extended name form). ✓

TestValidateChecksumRelPath_AcceptsHonestPaths (checksums_test.go:288)

Five Phase 0a dump-tree paths: plain file, Redis, S3, DynamoDB, SQS. Covers all enabled adapters. ✓
Asserts got == filepath.Clean(p) — correctly pins the contract that the function returns the cleaned form. ✓

Still open from previous rounds

These three items were deferred in earlier rounds and remain unchanged:

Severity	Issue	Location
P2	`cfg.outputRoot + "/MANIFEST.json"` — string concat instead of `filepath.Join`	`main.go:281`
P2	`defer func() { _ = out.Close() }()` — silent Close error; `Sync()` doesn't fully cover it	`main.go:285`
P3	`//nolint:gocritic` in test violates "Avoid //nolint — refactor instead"	`main_test.go:216`

None are blockers (the P2 string concat is cosmetic on POSIX; Sync() on line 289 covers the data-flush concern; the P3 nolint is in a test helper). Worth cleaning before merge but not holding up the security-relevant work.

Summary: The r3 fix is clean. filepath.IsLocal is the right delegation point for Windows reserved-name rejection. The call-order (raw rel before Clean) and belt-and-braces post-Clean guards are both correct. The two new tests are well-targeted and the Windows-only skip is properly justified. No new issues. All security-relevant findings across r1–r3 are now resolved.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b05ec8784c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:40:18Z

+	}
+	defer func() { _ = out.Close() }()
+	for _, e := range entries {
+		if _, err := fmt.Fprintf(out, "%s  %s\n", e.hex, e.relPath); err != nil {


Escape CHECKSUMS paths before writing line-based output

WriteChecksums writes each filename verbatim into a newline-delimited CHECKSUMS line ("%s %s\n"), but dump paths can legally contain newline characters (for example, S3 object keys are accepted without filtering \n), so one file can be serialized as multiple lines and make VerifyChecksums/sha256sum -c parse a corrupted manifest. In that case the backup becomes unverifiable even though file contents are intact; paths need sha256sum-compatible escaping (or an explicit rejection of newline-bearing names) before emitting the line format.

Useful? React with 👍 / 👎.

Three remaining reviewer findings on PR #810 (one P2, two mediums) that the prior rounds did not address. None is a fail-closed semantic change; together they tighten the CHECKSUMS on-disk format and the cmd's manifest write path against operational shapes the design promises to support. 1. (codex r4 P2) WriteChecksums wrote filenames verbatim into a newline-delimited file. A path containing `\n`, `\r`, or `\\` would corrupt the line-based output and let a single adversarial entry inject a second CHECKSUMS row that the verifier would accept. Fix: formatChecksumLine implements the GNU coreutils sha256sum(1) escape convention — leading `\` on the line plus `\n` / `\r` / `\\` substitutions. splitChecksumLine reverses the convention via unescapeChecksumPath at parse time. `sha256sum -c CHECKSUMS` from the dump root parses both forms identically, preserving the vendor-independent recovery property. Tests: - TestChecksumLine_EscapeRoundTrip — five paths covering `\n`, `\r`, `\\`, multiple escapes mixed, and the no-escape passthrough. - TestChecksumLine_EscapePrefixMatchesGNUFormat — pins the exact `\` + hex + ` ` + escaped-body shape so a future change cannot drift from sha256sum compatibility. - TestSplitChecksumLine_RejectsDanglingEscape — guards against a tampered CHECKSUMS using `\<EOL>` or `\x` for unknown `x`. 2. (gemini r1 medium, main.go:281) emitManifest concatenated `cfg.outputRoot + "/MANIFEST.json"`. On Windows this produces wrong path separators and would refuse a Windows-style output root. Fix: filepath.Join. Standard cross-platform shape. 3. (gemini r1 medium, main.go:285) emitManifest deferred a `_ = out.Close()` that swallowed the Close error. On NFS / FUSE filesystems Close is the authoritative durability signal (writeback errors surface there, not at Write time), so a silently-discarded Close error means the dump-tree invariant (MANIFEST.json on disk) can fail without the cmd surfacing it. Fix: explicit Sync() then Close() with both errors propagated; the defer'd close on error paths runs through `_ = out.Close()` because we have already surfaced the primary failure. Caller audit: - formatChecksumLine / unescapeChecksumPath are new helpers called only from WriteChecksums / splitChecksumLine. No cross-package use. - splitChecksumLine's exported behavior is rejection-side tightening (dangling escape) plus correct round-trip of escape-needing paths. No production caller of VerifyChecksums exists (test-only). - emitManifest is internal to cmd/elastickv-snapshot-decode. The Close-error surface is purely additive — previously- successful runs stay successful. Self-review: 1. Data loss — none; the new escape path PREVENTS data loss (line injection); the Close-error surface PREVENTS silent manifest-write loss on NFS. 2. Concurrency — none. 3. Performance — single Builder per escape-needing line; the no-escape fast path is unchanged. 4. Data consistency — sha256sum(1) escape round-trip is byte-identical (round-trip test pins it). 5. Test coverage — three new tests; existing seven CHECKSUMS tests still pass.

bootjp · 2026-05-23T15:12:11Z

@claude review

Addressed three additional reviewer findings (codex P2 + 2 gemini mediums) in ec358ab:

Codex r4 P2 (WriteChecksums line-injection): WriteChecksums wrote filenames verbatim into a newline-delimited file. A path containing \n, \r, or \\ would corrupt the line-based output. Fix: formatChecksumLine implements GNU coreutils sha256sum(1) escape convention (leading \ + \n/\r/\\ substitutions); splitChecksumLine reverses it via unescapeChecksumPath at parse time. sha256sum -c CHECKSUMS parses both forms identically.
Gemini r1 medium (main.go:281): switched cfg.outputRoot + "/MANIFEST.json" to filepath.Join for cross-platform correctness.
Gemini r1 medium (main.go:285): emitManifest no longer swallows out.Close(). Explicit Sync() then Close() with both errors propagated — required for NFS/FUSE where Close is the authoritative durability signal.

Three new tests pin the round-trip and the parse-side guards (TestChecksumLine_EscapeRoundTrip, TestChecksumLine_EscapePrefixMatchesGNUFormat, TestSplitChecksumLine_RejectsDanglingEscape).

Caller audit: formatChecksumLine / unescapeChecksumPath are new helpers (only called from WriteChecksums / splitChecksumLine); splitChecksumLine is internal; emitManifest is internal to the cmd.

claude · 2026-05-23T15:12:26Z

Claude finished @bootjp's task in 2m 42s —— View job

Code Review: PR #810 — r4 (`ec358ab9`)

Todo list

Load tools
Read checksums.go (escape/unescape logic + formatChecksumLine / unescapeChecksumPath)
Read checksums_test.go (new round-trip and edge-case tests)
Read main.go (filepath.Join + Close fix)
Verify formatChecksumLine / unescapeChecksumPath correctness against GNU sha256sum(1) spec
Verify splitChecksumLine parse correctness with escaped paths
Check remaining open items (P3 nolint)
Post review findings

Fix 1 — `WriteChecksums` line-injection (Codex r4 P2) ✅

checksums.go:98-117 / checksums.go:362-421

The escape/unescape implementation is correct. Traced through the full cycle:

formatChecksumLine(hex, "weird\nfile") → sb escapes \n to two-char sequence \ + n, returns (hex + " weird\nfile", "\") where \n is two chars. Prefix is the single-char \. The fmt.Fprintf(out, "%s%s\n", prefix, line) writes \<hex> weird\nfile as one line terminated by the real 0x0A. ✓
The bufio.Scanner reads exactly that one line (stops at 0x0A). ✓
splitChecksumLine: strings.HasPrefix(line, "\\") fires, strips the prefix byte, extracts body "weird\nfile" (two-char \n), calls unescapeChecksumPath. ✓
unescapeChecksumPath("weird\nfile"): on \ increments i, reads n, writes real 0x0A. Returns the original string. ✓

Edge cases verified:

Backslash-only path (\): formats as \<hex> \\, unescapes back to \. ✓
Unknown escape (\x): unescapeChecksumPath returns (_, false) → splitChecksumLine returns false. ✓
Dangling backslash (trailing \): i >= len(body) guard fires, returns (_, false). ✓
No-escape pass-through: returns empty prefix, no unescape path touched. ✓

The three new tests (TestChecksumLine_EscapeRoundTrip, TestChecksumLine_EscapePrefixMatchesGNUFormat, TestSplitChecksumLine_RejectsDanglingEscape) pin all four arms correctly.

Fix 2 — `filepath.Join` for MANIFEST path ✅

main.go:284

out, err := os.Create(filepath.Join(cfg.outputRoot, "MANIFEST.json"))

Correct. ✓

Fix 3 — Silent `Close` error on MANIFEST writer ✅

main.go:299-310

emitManifest now uses the correct pattern: WriteManifest error → Close (discarded) + return; Sync error → Close (discarded) + return; explicit Close with error propagated. All three early-exit paths surface errors cleanly. The pattern mirrors what the project's storage layer does for write-then-sync durability. ✓

P3 — Two `//nolint` directives remain

checksums.go:103 (new in r4):

sb.Grow(len(relPath) + 4) //nolint:mnd // small heuristic to avoid early grow

Per project guidelines: "Avoid //nolint — refactor instead." This can be replaced with a named constant:

const escapeGrowPad = 4 // one escape sequence adds at most one extra byte, plus prefix
sb.Grow(len(relPath) + escapeGrowPad)

main_test.go:216 (carried over from prior rounds — still open):

encKey := append([]byte(backup.RedisStringPrefix+userKey), make([]byte, snapshotTSSize)...) //nolint:gocritic

Same refactor as before — make + copy over append. Not a blocker since it's in a test helper.

Minor doc note — "binary mode" wording in `WriteChecksums`

checksums.go:52

Each line is <hex> <relative-path>\n (two spaces, "binary" mode in sha256sum vocabulary).

Two-space separator is sha256sum's text mode; binary mode uses *. This is a documentation mistake only — the emitted format (two spaces) is correct and sha256sum -c CHECKSUMS accepts it. Worth fixing before the doc is read by a future restore-tool author who checks against the sha256sum(1) man page.

Summary

Severity	Issue	Location	Status
✅ Fixed	`WriteChecksums` newline-injection via unescaped filenames	`checksums.go`	r4
✅ Fixed	`filepath.Join` for MANIFEST path	`main.go:284`	r4
✅ Fixed	Silent `Close` error on MANIFEST writer	`main.go:285`	r4
P3	`//nolint:mnd` in `formatChecksumLine` (new)	`checksums.go:103`	Open
P3	`//nolint:gocritic` in test helper	`main_test.go:216`	Open (prior)
P3	Doc: "binary mode" should be "text mode" in `WriteChecksums`	`checksums.go:52`	Open

All security-relevant findings across r1–r4 are resolved. The two //nolint violations and the doc wording are the only remaining items — none are blockers. This is ready to merge once #806 lands and the stacking dependency is resolved.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

internal/backup/checksums_test.go (1)

107-117: ⚡ Quick win

Pin ErrChecksumMismatch in the tampering test.

This currently passes on any non-nil error, so a regression to a generic parse or I/O failure would still look green.

Suggested change

-	if err := VerifyChecksums(root); err == nil {
-		t.Fatalf("expected VerifyChecksums to detect tampering, got nil")
-	}
+	err := VerifyChecksums(root)
+	if !errors.Is(err, ErrChecksumMismatch) {
+		t.Fatalf("err = %v, want ErrChecksumMismatch", err)
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/checksums_test.go` around lines 107 - 117, Update
TestVerifyChecksums_DetectsTampering to assert the returned error specifically
indicates a checksum mismatch rather than any non-nil error: after calling
VerifyChecksums(root) replace the generic nil check with an errors.Is(err,
ErrChecksumMismatch) (or equivalent comparison) and fail the test if that check
is false; reference the test name TestVerifyChecksums_DetectsTampering, the
VerifyChecksums call, and the ErrChecksumMismatch sentinel so the test only
passes when VerifyChecksums returns the expected checksum-mismatch error.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/backup/checksums.go`:
- Around line 71-82: The WriteChecksums function currently defers out.Close()
and only returns the error from out.Sync(), which discards Close() failures;
replace the deferred close with explicit handling: remove the defer, call
syncErr := out.Sync() after writing, then call closeErr := out.Close(), and
return a wrapped/combined error that surfaces closeErr when present (e.g., if
closeErr != nil return errors.WithStack(fmt.Errorf("%v; close error: %w",
syncErr, closeErr)) else return errors.WithStack(syncErr)). Refer to the file
writer variable out and the existing out.Sync() call when making this change.

---

Nitpick comments:
In `@internal/backup/checksums_test.go`:
- Around line 107-117: Update TestVerifyChecksums_DetectsTampering to assert the
returned error specifically indicates a checksum mismatch rather than any
non-nil error: after calling VerifyChecksums(root) replace the generic nil check
with an errors.Is(err, ErrChecksumMismatch) (or equivalent comparison) and fail
the test if that check is false; reference the test name
TestVerifyChecksums_DetectsTampering, the VerifyChecksums call, and the
ErrChecksumMismatch sentinel so the test only passes when VerifyChecksums
returns the expected checksum-mismatch error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9ec94dd-3a71-43ec-b607-205c3f1e43fb

📥 Commits

Reviewing files that changed from the base of the PR and between 2c3cd69 and ec358ab.

📒 Files selected for processing (6)

cmd/elastickv-snapshot-decode/main.go
cmd/elastickv-snapshot-decode/main_test.go
internal/backup/checksums.go
internal/backup/checksums_test.go
internal/backup/source.go
internal/backup/source_test.go

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec358ab9ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T15:20:26Z

+// decodeOne runs the dispatcher against the configured input and
+// writes MANIFEST.json + CHECKSUMS into the output tree.
+func decodeOne(cfg *config, logger *slog.Logger) error {
+	if err := os.MkdirAll(cfg.outputRoot, defaultOutputDirMode); err != nil {


Refuse non-empty output roots before decoding

decodeOne only calls os.MkdirAll and then writes new adapter files into whatever is already under --output. Because WriteChecksums later hashes all regular files under that root, rerunning the CLI into a previously used directory can silently include stale files from an older snapshot in the new CHECKSUMS and dump tree. This makes the resulting backup represent a mixed state rather than the requested snapshot; consider failing fast when --output is non-empty (or cleaning adapter subtrees first).

Useful? React with 👍 / 👎.

Coderabbit r5 Major (checksums.go:82): WriteChecksums returned out.Sync() but discarded the deferred out.Close() error. Same shape as the r4 gemini medium on emitManifest, applied here too. On NFS / some FUSE backends Close is where buffered-writeback failures surface, so a swallowed Close means the CLI can declare success on a truncated or unreadable CHECKSUMS file. Fix: explicit Sync() then Close() with both errors propagated, with the pre-existing defer'd Close swallowed only on the error-return paths (the primary failure has already been surfaced). Extracted writeAllChecksumLines so the parent stays a thin open / write / sync / close skeleton with the durability contract spelled out. Caller audit (CLAUDE.md "semantic-change → grep all callers"): - WriteChecksums has one caller (cmd/elastickv-snapshot-decode/main.go:253), which already surfaces the returned error verbatim via errors.Wrap. The semantic tightening (Close errors now surface) is honored by the existing wrapping; no caller change needed. Also addresses coderabbit r5 nitpick: typed-sentinel assertion in TestVerifyChecksums_DetectsTampering (was checking err != nil generically; now pins ErrChecksumMismatch so a future change that turns the tamper signal into a different error class surfaces as a test failure). Self-review: 1. Data loss — fix prevents silent CHECKSUMS-truncation on NFS / FUSE. 2. Concurrency — none. 3. Performance — no new I/O; just surfaces an existing error. 4. Data consistency — durability contract now honest. 5. Test coverage — typed-sentinel assertion added; existing round-trip tests still pass.

bootjp · 2026-05-23T15:24:37Z

@claude review

Addressed coderabbit r5 Major + nitpick in 7a8e2f3:

Coderabbit Major (checksums.go:82 — Close error swallowed): WriteChecksums returned only Sync()'s error; the deferred Close() was discarded. Fix: explicit Sync() then Close() with both errors propagated, same pattern as r4's emitManifest fix. Extracted writeAllChecksumLines so WriteChecksums stays a thin open/write/sync/close skeleton.
Coderabbit nitpick (TestVerifyChecksums_DetectsTampering): typed-sentinel assertion added — now pins ErrChecksumMismatch instead of any non-nil error, so a future change that turns tamper detection into a different error class surfaces as a test failure.

Caller audit: WriteChecksums has one caller (cmd/elastickv-snapshot-decode/main.go:253), which already surfaces the returned error verbatim via errors.Wrap. The semantic tightening (Close errors now surface) is honored by the existing wrapping.

claude · 2026-05-23T15:24:52Z

Claude finished @bootjp's task in 1m 50s —— View job

Code Review: PR #810 — r5 (`7a8e2f30`)

Todo list

Load tools
Read checksums.go and checksums_test.go (current state)
Verify r5 fix: WriteChecksums Close error propagation
Verify r5 nitpick fix: TestVerifyChecksums_DetectsTampering sentinel assertion
Check for remaining open items
Post review findings

Fix 1 — `WriteChecksums` Close error propagation (CodeRabbit Major) ✅

checksums.go:63-109

The extraction to writeAllChecksumLines and the explicit Sync+Close lifecycle are both correct. Tracing all exit paths from WriteChecksums:

collectChecksumEntries errors → return early, no file opened. ✓
os.Create errors → return early, no file opened. ✓
writeAllChecksumLines errors → _ = out.Close() (correct to discard: primary error is the write failure) + return. ✓
out.Sync() errors → _ = out.Close() (same rationale) + return. ✓
Happy path → out.Close() error is the return value, surfaces to caller. ✓

writeAllChecksumLines(w io.Writer, ...) takes an io.Writer rather than *os.File, which is the right abstraction — the loop owns nothing about durability, the skeleton above it owns the open/sync/close contract. Clean separation.

The comment block at checksums.go:83-90 correctly explains why Close() failures matter on NFS/FUSE backends. Good rationale to leave in.

Fix 2 — `TestVerifyChecksums_DetectsTampering` typed sentinel (CodeRabbit nitpick) ✅

checksums_test.go:111-126

err := VerifyChecksums(root)
if err == nil {
    t.Fatalf("expected VerifyChecksums to detect tampering, got nil")
}
if !errors.Is(err, ErrChecksumMismatch) {
    t.Fatalf("err = %v, want ErrChecksumMismatch", err)
}

Dual-check is the right pattern here: the nil guard gives a clean "no error at all" message, and the errors.Is check pins the specific sentinel. A future refactor that accidentally returns ErrChecksumsMalformedLine from the tamper path will now surface as a test failure. ✓

Still open from prior rounds

Severity	Issue	Location	Status
P2	Non-empty `--output` dir mixes stale files into new CHECKSUMS (Codex r4)	`main.go:214`	Open
P3	`//nolint:mnd` in `formatChecksumLine` — named constant would fix	`checksums.go:129`	Open (r4)
P3	`//nolint:gocritic` in test helper	`main_test.go:216`	Open (early rounds)
P3	Doc: two-space separator is text mode, not "binary" mode in sha256sum(1)	`checksums.go:49`	Open (r4)

The P2 non-empty-output-root finding (Codex r4) is worth noting: if the operator reruns decodeOne into an existing --output directory, WriteChecksums will hash all regular files found there — including stale files from a prior decode run — and produce a CHECKSUMS that attests to a mixed-snapshot tree. The fix would be a preflight check that rejects a non-empty outputRoot (or documents "caller must use a fresh directory"). Not a blocker if the CLI docs/--help call this out, but it's easy to misuse silently.

The three P3 items are cosmetic. The security-relevant findings across all five rounds are resolved. This is ready to merge once #806 lands.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a8e2f308e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T15:30:35Z

+	if line == "" {
+		return nil


Fail verification when CHECKSUMS has no valid entries

verifyOneChecksumLine treats blank lines as success, and VerifyChecksums never enforces that at least one checksum row was parsed. As a result, a truncated/corrupted CHECKSUMS file containing only empty lines returns nil, so integrity verification can report success without checking any file content. This diverges from sha256sum -c semantics for malformed/empty manifests and can hide backup corruption.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-23T15:30:35Z

+		if err != nil {
+			return errors.WithStack(err)


Classify missing checksum targets as checksum failures

When a listed file is missing, refuseSymlinkComponents returns the raw os.Lstat error, so callers cannot detect it via errors.Is(err, ErrChecksumMismatch) and it bypasses the verifier's typed checksum-failure path. In practice, partial restores or deleted files produce an unclassified path error instead of a checksum verification failure, which breaks error handling that keys on checksum result types.

Useful? React with 👍 / 👎.

…sification) Two codex P2 findings on r5 (commit 7a8e2f3): 1. (codex P2, checksums.go:256) VerifyChecksums returned nil on a CHECKSUMS file with zero parsed rows (empty file, blank lines only). Phase 0a dumps are non-empty by construction (WriteChecksums lists every regular file), so a zero-row CHECKSUMS is always a producer-side corruption signal, never a benign shape. The verifier was hiding partial-write or truncation failures behind a "no rows checked == OK" answer. Fix: VerifyChecksums now counts verified rows and returns ErrChecksumsEmpty when the count is zero. Blank lines still skip silently (matching sha256sum -c's tolerance of trailing newlines) but do not satisfy the at-least-one-row guard. Test: TestVerifyChecksums_RejectsEmptyChecksumsFile covers three corruption shapes (empty file, blank lines only, trailing-newline only). 2. (codex P2, checksums.go:306) refuseSymlinkComponents returned raw fs.ErrNotExist when a CHECKSUMS-listed file was missing from disk. Callers branching on errors.Is(err, ErrChecksumMismatch) missed this case — partial restores and tamper-deletes produced an unclassified path error instead. Fix: detect fs.ErrNotExist at the refuseSymlinkComponents call site (and at sha256File's call site as belt-and-braces) and promote to ErrChecksumMismatch. The design's contract ("Files referenced by CHECKSUMS but missing on disk surface as the same error class so a partial-restore is obvious") is now honored. Test: TestVerifyChecksums_MissingTargetIsMismatch — writes a CHECKSUMS row, deletes the listed file, asserts ErrChecksumMismatch via errors.Is. API shape: verifyOneChecksumLine returns (bool, error) so the parent loop can count "real" verified rows separately from blank-line skips. Internal helper; signature change does not escape the package. Caller audit: - verifyOneChecksumLine is internal to checksums.go; the signature change is contained. - VerifyChecksums has no production callers yet (test-only). The new ErrChecksumsEmpty and the promoted ErrChecksumMismatch shapes are additive on the typed-sentinel surface; honest CHECKSUMS still verify cleanly. Self-review: 1. Data loss — none; both fixes catch silent-corruption signals. 2. Concurrency — none. 3. Performance — one int counter per file; sha256 streaming unchanged. 4. Data consistency — empty-CHECKSUMS hole closed; missing- target now classified. 5. Test coverage — two new tests; existing thirteen CHECKSUMS tests still pass.

bootjp · 2026-05-24T00:04:44Z

@claude review

Addressed two codex r5 P2 findings in 2822f2d:

Codex P2 (checksums.go:256 — empty CHECKSUMS hidden): VerifyChecksums returned nil on a CHECKSUMS file with zero parsed rows. Phase 0a dumps are non-empty by construction, so an empty CHECKSUMS is always a producer-side corruption signal. Fix: count verified rows and return new typed sentinel ErrChecksumsEmpty when count is zero. Blank-line tolerance preserved (sha256sum -c shape) but does not satisfy the at-least-one-row guard.
Codex P2 (checksums.go:306 — missing target unclassified): refuseSymlinkComponents returned raw fs.ErrNotExist when a listed file was missing. Callers branching on errors.Is(err, ErrChecksumMismatch) missed partial-restore / delete cases. Fix: detect fs.ErrNotExist at the call sites and promote to ErrChecksumMismatch, honoring the design's "missing surfaces as same error class" contract.
verifyOneChecksumLine signature changed to (bool, error) so the parent counts real rows separately from blank-line skips (internal helper, contained).

Two new regression tests: TestVerifyChecksums_RejectsEmptyChecksumsFile (three corruption shapes) and TestVerifyChecksums_MissingTargetIsMismatch.

Caller audit: verifyOneChecksumLine is internal to checksums.go; VerifyChecksums has no production callers (test-only). Both new typed sentinels are additive.

claude · 2026-05-24T00:05:00Z

Claude finished @bootjp's task in 1m 54s —— View job

Code Review: PR #810 — r6 (`2822f2db`)

Todo list

Load tools
Read checksums.go and checksums_test.go (current state)
Verify Fix 1: ErrChecksumsEmpty guard on zero-row CHECKSUMS
Verify Fix 2: ErrNotExist → ErrChecksumMismatch promotion
Verify verifyOneChecksumLine signature change
Check for remaining open items
Post review findings

Both r6 fixes are correct. Detailed analysis below.

Fix 1 — `ErrChecksumsEmpty` on zero-row CHECKSUMS ✅

checksums.go:228-261

The implementation is correct. VerifyChecksums counts verified separately from lineNum; blank-line skips (verifyOneChecksumLine returns (false, nil)) don't increment the counter. The three scanner paths through the empty-file guard:

Empty file (""): sc.Scan() never fires → verified == 0 → ErrChecksumsEmpty. ✓
Blank-lines only ("\n\n\n"): three scans of "", each returns (false, nil) → verified == 0 → ErrChecksumsEmpty. ✓
Malformed lines: verifyOneChecksumLine returns (false, ErrChecksumsMalformedLine) → VerifyChecksums returns immediately without reaching the verified == 0 guard. Correct — a CHECKSUMS with malformed lines surfaces the specific parse error, not the generic empty-guard. ✓

One style note: ErrChecksumsEmpty is declared at checksums.go:263-268, after VerifyChecksums which uses it at line 258. Go is order-independent within a package so this compiles and runs correctly; just slightly unusual to read. Not a bug.

The new test TestVerifyChecksums_RejectsEmptyChecksumsFile covers all three corruption shapes with the correct errors.Is(err, ErrChecksumsEmpty) assertion. ✓

Fix 2 — `ErrNotExist` → `ErrChecksumMismatch` promotion ✅

checksums.go:297-328

The promotion is correctly applied at two distinct call sites:

After refuseSymlinkComponents (checksums.go:305-308): os.Lstat returns fs.ErrNotExist for a missing file — this propagates out of refuseSymlinkComponents as errors.WithStack(err) and is caught here. Correctly promoted.

After sha256File (checksums.go:319-322): This guards the TOCTOU race — if the file was deleted between the os.Lstat in refuseSymlinkComponents and the os.Open in sha256File. The duplication is intentional and the comment explains it. Not dead code.

The comment at checksums.go:298-304 correctly documents that a missing intermediate directory also surfaces as fs.ErrNotExist from refuseSymlinkComponents — for example, rel = "a/b/c.txt" where a/b/ was deleted. Treating this as ErrChecksumMismatch is the right classification: the listed file can't exist, which is operationally the same as tamper or partial-restore.

The new test TestVerifyChecksums_MissingTargetIsMismatch correctly exercises this path. ✓

`verifyOneChecksumLine` signature change ✅

checksums.go:284

Changed from error to (bool, error). The bool signals "this was a real checksum row" (as opposed to a blank-line skip). The parent loop uses it correctly:

got, err := verifyOneChecksumLine(root, sc.Text(), lineNum)
if err != nil {
    return err
}
if got {
    verified++
}

The function is package-private (verifyOneChecksumLine unexported), so the signature change is fully contained. Caller audit: only one call site, in VerifyChecksums. ✓

Still open from prior rounds

Severity	Issue	Location	Status
P3	`//nolint:mnd` in `formatChecksumLine` — named constant would fix	`checksums.go:130`	Open (r4)
P3	`//nolint:gocritic` in test helper	`main_test.go:216`	Open (early rounds)
P3	Doc: two-space separator is text mode, not "binary" mode in sha256sum(1)	`checksums.go:50`	Open (r4)
P2	Non-empty `--output` dir mixes stale files into new CHECKSUMS	`main.go:214`	Open (r5)

The P3 items are cosmetic and the P2 stale-output-root finding is operator-ergonomics (not a correctness bug — the CHECKSUMS is accurate for whatever files are present; it just attests to a mixed tree). None are blockers.

Summary: All security-relevant findings across r1–r6 are resolved. The two ErrNotExist promotion sites are both necessary (primary path + TOCTOU belt-and-braces). The ErrChecksumsEmpty guard correctly distinguishes "zero verified rows" from "some malformed rows" by exiting early on parse errors before reaching the empty guard. Ready to merge once #806 lands.

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Comment thread internal/backup/checksums.go Outdated

bootjp added 2 commits May 23, 2026 23:20

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

bootjp force-pushed the backup/snapshot-decode-cmd branch from 34970f0 to b60f468 Compare May 23, 2026 14:22

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

coderabbitai Bot reviewed May 23, 2026

View reviewed changes

Comment thread internal/backup/checksums.go Outdated

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

bootjp merged commit fde4ee3 into main May 24, 2026
9 of 11 checks passed

bootjp deleted the backup/snapshot-decode-cmd branch May 24, 2026 06:50

	out, err := os.Create(cfg.outputRoot + "/MANIFEST.json") //nolint:gosec // operator-supplied path
	out, err := os.Create(filepath.Join(cfg.outputRoot, "MANIFEST.json")) //nolint:gosec // operator-supplied path

Conversation

bootjp commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What lands

CLI surface

Output shape

Test plan

Self-review (CLAUDE.md §"Self-review of code changes")

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bootjp commented May 23, 2026

Uh oh!

claude Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #810 — elastickv-snapshot-decode CLI + CHECKSUMS writer

P0 — Data loss: S3 output is destroyed when ScratchRoot is not set

P1 — Security: path traversal in VerifyChecksums

P2 — Minor: filepath.Join for MANIFEST path

P2 — Minor: silent Close error on MANIFEST writer

P2 — Minor: os.ReadFile in VerifyChecksums reads entire file into memory

P3 — Style: //nolint:gocritic in test

Summary

Uh oh!

bootjp commented May 23, 2026

Uh oh!

claude Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: PR #810 — fixes in 34970f00

Fix 1 — S3 scratch root isolation (Codex P1) ✅

Fix 2 — Path traversal in VerifyChecksums (CodeRabbit Critical) ✅

Fix 3 — bufio.Scanner OOM prevention (Gemini) ✅

Still open from the previous round

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 23, 2026

Uh oh!

claude Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

bootjp commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading

claude Bot commented May 23, 2026 •

edited

Loading

Code Review: PR #810 — `elastickv-snapshot-decode` CLI + CHECKSUMS writer

P0 — Data loss: S3 output is destroyed when `ScratchRoot` is not set

P1 — Security: path traversal in `VerifyChecksums`

P2 — Minor: `filepath.Join` for MANIFEST path

P2 — Minor: silent `Close` error on MANIFEST writer

P2 — Minor: `os.ReadFile` in `VerifyChecksums` reads entire file into memory

P3 — Style: `//nolint:gocritic` in test

claude Bot commented May 23, 2026 •

edited

Loading

Code Review: PR #810 — fixes in `34970f00`

claude Bot commented May 23, 2026 •

edited

Loading

Code Review: PR #810 — r1 (`b60f468`)

P1 — Symlink bypass in `VerifyChecksums` (new, Codex `checksums.go:204`)

claude Bot commented May 23, 2026 •

edited

Loading

Code Review: PR #810 — r2 (`2bb8cd7f`, symlink escape fix)

Symlink fix — `refuseSymlinkComponents` ✅

claude Bot commented May 23, 2026 •

edited

Loading

Code Review: PR #810 — r3 (`b05ec878`)

`filepath.IsLocal` fix — `validateChecksumRelPath` ✅

claude Bot commented May 23, 2026 •

edited

Loading

Code Review: PR #810 — r4 (`ec358ab9`)

Fix 1 — `WriteChecksums` line-injection (Codex r4 P2) ✅

Fix 2 — `filepath.Join` for MANIFEST path ✅

Fix 3 — Silent `Close` error on MANIFEST writer ✅

P3 — Two `//nolint` directives remain

Minor doc note — "binary mode" wording in `WriteChecksums`