Skip to content

backup: cmd/elastickv-snapshot-decode CLI + CHECKSUMS writer#810

Merged
bootjp merged 7 commits into
mainfrom
backup/snapshot-decode-cmd
May 24, 2026
Merged

backup: cmd/elastickv-snapshot-decode CLI + CHECKSUMS writer#810
bootjp merged 7 commits into
mainfrom
backup/snapshot-decode-cmd

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 23, 2026

Summary

Adds the Phase 0a operator-facing binary on top of the dispatcher in #806. Plus the CHECKSUMS writer / verifier and the <uint64>.fsm filename → snapshot-index parser the CLI needs to stamp MANIFEST.json.

Stacked on backup/snapshot-dispatcher — once #806 merges, this PR's diff collapses to just the CLI + helpers and I'll retarget to main.

What lands

File Purpose
cmd/elastickv-snapshot-decode/main.go Flag parsing, DecodeSnapshot invocation, MANIFEST.json + CHECKSUMS emission
cmd/elastickv-snapshot-decode/main_test.go End-to-end synthetic-snapshot golden path + four flag-validation paths + parseAdapterSet table test
internal/backup/checksums.go WriteChecksums(root) / VerifyChecksums(root) — sha256sum(1)-compatible
internal/backup/source.go SnapshotIndexFromPath<uint64>.fsm filename parser used to stamp MANIFEST.snapshot_index
internal/backup/checksums_test.go, source_test.go Pin file shape, deterministic ordering, tamper detection, parse table

CLI surface

elastickv-snapshot-decode \
  --input  <fsm-file>          (required)
  --output <directory-root>    (required)
  [--adapter dynamodb,s3,redis,sqs|all]
  [--cluster-id <id>]
  [--include-incomplete-uploads]
  [--include-orphans]
  [--preserve-sqs-visibility]
  [--include-sqs-side-records]
  [--rename-collisions]
  [--dynamodb-bundle-mode per-item|jsonl]

Matches the design's §"Tooling" decoder section, minus the deferred --input-snap-dir, --scope, and --dynamodb-bundle-size flags (called out in the design as later additions).

Output shape

MANIFEST.json carries phase = "phase0-snapshot-decode", the filename-derived snapshot_index, the dispatcher-surfaced last_commit_ts, exclusion flags from the CLI, and a non-nil Adapter{} pointer for every enabled adapter (per-scope enumeration is a follow-up).

CHECKSUMS is the standard sha256sum --binary shape — verifiable with sha256sum -c CHECKSUMS from the dump root, no elastickv binary involved.

Test plan

  • go test -race ./internal/backup/ ./cmd/elastickv-snapshot-decode/ (passes)
  • golangci-lint run ./... over the touched packages (clean — 0 issues)
  • go build ./... (clean)
  • Golden-path test: synthetic !redis|str|greetingredis/db_0/strings/greeting.bin + MANIFEST.json + CHECKSUMS + VerifyChecksums round-trip

Self-review (CLAUDE.md §"Self-review of code changes")

  1. Data loss — CHECKSUMS is written last so a crash mid-dump leaves an intact-but-unchecked tree; MkdirAll(outputRoot) is idempotent.
  2. Concurrency — single-threaded; no shared state across goroutines.
  3. Performance — sha256 streams via io.Copy; multi-GB files pass without buffering. CHECKSUMS walk is O(files).
  4. Data consistencyMANIFEST.last_commit_ts comes from the dispatcher header (now surfaced via ReadSnapshotWithHeader so empty-snapshot dumps still populate it); snapshot_index parsed from the filename per the documented live-writer convention; encryption-state failures propagate from DecodeSnapshot as ErrSnapshotEncryptedEntry.
  5. Test coverage — five CLI tests (golden path + four flag-validation paths); seven backup-package tests for the new helpers; existing backup-package tests still pass.

Summary by CodeRabbit

  • New Features

    • New CLI to decode local snapshots into vendor-independent directory trees with adapter selection, output options, and manifest/CHECKSUMS generation
    • CHECKSUMS written in sha256sum-compatible format for integrity verification; manifest includes version, cluster metadata and parsed snapshot index when available
  • Bug Fixes / Security

    • CHECKSUMS verification rejects path-traversal and symlink escape attempts
  • Tests

    • Comprehensive tests covering decoding, manifesting, checksums, parsing, and security regressions

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 19 minutes and 33 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1e2286e-dd6c-493f-929f-decc7f2fcf74

📥 Commits

Reviewing files that changed from the base of the PR and between ec358ab and 2822f2d.

📒 Files selected for processing (2)
  • internal/backup/checksums.go
  • internal/backup/checksums_test.go
📝 Walkthrough

Walkthrough

Adds a new CLI command elastickv-snapshot-decode that decodes Pebble .fsm snapshots into per-adapter dump trees, emits a Phase 0a MANIFEST.json (with optional snapshot index), and writes sha256sum-compatible CHECKSUMS; includes snapshot-index parsing and comprehensive tests.

Changes

Snapshot Decode CLI

Layer / File(s) Summary
SHA-256 checksum implementation
internal/backup/checksums.go
WriteChecksums walks a dump root, computes lowercase SHA-256 for regular files, escapes paths, sorts entries, and writes a sha256sum-compatible CHECKSUMS. VerifyChecksums streams and validates lines, re-hashes referenced files, and enforces path/symlink safety.
Checksum tests and regressions
internal/backup/checksums_test.go
Pinned on-disk CHECKSUMS formatting, self-exclusion, determinism, tampering detection, traversal/symlink rejection, escape round-trips, streaming scalability, and unit tests for path validation/parsing.
Snapshot index extraction from filesystem paths
internal/backup/source.go, internal/backup/source_test.go
SnapshotIndexFromPath extracts a base-10 uint64 from a <index>.fsm basename or returns ErrSnapshotIndexUnparseable; tests cover valid and invalid filename forms.
CLI entrypoint and flag parsing
cmd/elastickv-snapshot-decode/main.go (lines 1–209)
Adds main and testable run; parseFlags validates --input/--output, parses adapter CSV (including all), validates --dynamodb-bundle-mode, and exposes sentinel errors for unknown adapters and invalid bundle modes.
Decode orchestration and manifest emission
cmd/elastickv-snapshot-decode/main.go (lines 210–351)
decodeOne opens the FSM, calls backup.DecodeSnapshot with CLI-derived options and a warn sink, logs counters and last_commit_ts, emits MANIFEST.json (version, cluster-id, last commit ts, optional snapshot index, source path, adapter scopes, exclusion flags), and invokes WriteChecksums.
CLI integration and unit tests
cmd/elastickv-snapshot-decode/main_test.go
Golden-path test synthesizes a Redis string snapshot and asserts decoded blob bytes, MANIFEST.json fields (phase, snapshot index, cluster id), and CHECKSUMS verification. Additional tests cover missing --input, unknown adapters, zero-adapter selection, and unparseable .fsm filename behavior.

Sequence Diagram

sequenceDiagram
  participant CLI as CLI Entrypoint
  participant Parser as Flag Parser
  participant Decoder as backup.DecodeSnapshot
  participant Manifest as Manifest Emitter
  participant Checksum as WriteChecksums

  CLI->>Parser: parseFlags(argv)
  Parser->>Parser: validate input/output and adapter set
  Parser->>Decoder: run decodeOne with config
  Decoder->>Decoder: open FSM snapshot
  Decoder->>Decoder: decode snapshot bytes
  Decoder->>Manifest: emitManifest (version, cluster-id, timestamp, snapshot_index)
  Manifest->>Manifest: populate adapter scopes and write MANIFEST.json
  Decoder->>Checksum: WriteChecksums(output root)
  Checksum->>Checksum: walk tree, hash files, write CHECKSUMS
  Decoder-->>CLI: return success or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I dug through bytes of snapshot soil,
Framed keys and values with careful toil,
Manifests stamped and checksums sewn tight,
Per-adapter trees tidy and right,
Phase 0a hops into the light.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: adding a new CLI command (elastickv-snapshot-decode) and a CHECKSUMS writer subsystem for backup operations.
Docstring Coverage ✅ Passed Docstring coverage is 91.84% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backup/snapshot-decode-cmd

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the elastickv-snapshot-decode CLI tool and supporting internal/backup package logic to decode Pebble snapshots into a vendor-independent directory structure. The implementation includes manifest generation, snapshot index extraction, and a sha256 checksum utility. Review feedback identifies a path traversal vulnerability and memory inefficiency in the checksum verification logic, recommends using filepath.Join for cross-platform path handling, and advises against silently ignoring errors from Close() calls.

Comment thread internal/backup/checksums.go Outdated
Comment on lines +141 to +161
body, err := os.ReadFile(filepath.Join(root, CHECKSUMSFilename)) //nolint:gosec // operator-supplied output dir
if err != nil {
return errors.WithStack(err)
}
lines := strings.Split(strings.TrimRight(string(body), "\n"), "\n")
for i, line := range lines {
if line == "" {
continue
}
want, rel, ok := splitChecksumLine(line)
if !ok {
return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i+1, line)
}
got, err := sha256File(filepath.Join(root, rel))
if err != nil {
return errors.Wrapf(err, "verifying %s", rel)
}
if got != want {
return errors.Wrapf(ErrChecksumMismatch, "%s: have %s want %s", rel, got, want)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This implementation has two significant issues:

  1. Efficiency & Scalability: os.ReadFile followed by strings.Split reads the entire CHECKSUMS file into memory and creates multiple copies of the data. For large dump trees with many files, this can lead to excessive memory consumption or OOM. Using bufio.Scanner to process the file line-by-line is much more efficient.
  2. Security (Path Traversal): The rel path parsed from the checksum file is joined with root without validation. A malicious CHECKSUMS file could contain absolute paths or paths with .. segments, allowing an attacker to trick the tool into reading and hashing arbitrary files on the system. Use filepath.IsLocal(rel) to ensure the path is relative and stays within the intended directory.

Note: You will need to add "bufio" to your imports.

f, err := os.Open(filepath.Join(root, CHECKSUMSFilename)) //nolint:gosec // operator-supplied output dir
if err != nil {
	return errors.WithStack(err)
}
defer func() { _ = f.Close() }()

scanner := bufio.NewScanner(f)
for i := 1; scanner.Scan(); i++ {
	line := scanner.Text()
	if line == "" {
		continue
	}
	want, rel, ok := splitChecksumLine(line)
	if !ok {
		return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i, line)
	}
	if !filepath.IsLocal(rel) {
		return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: path %q is not local", i, rel)
	}
	got, err := sha256File(filepath.Join(root, rel))
	if err != nil {
		return errors.Wrapf(err, "verifying %s", rel)
	}
	if got != want {
		return errors.Wrapf(ErrChecksumMismatch, "%s: have %s want %s", rel, got, want)
	}
}
if err := scanner.Err(); err != nil {
	return errors.WithStack(err)
}

Comment thread cmd/elastickv-snapshot-decode/main.go Outdated
if cfg.bundleJSONL {
m.DynamoDBLayout = backup.DynamoDBLayoutJSONL
}
out, err := os.Create(cfg.outputRoot + "/MANIFEST.json") //nolint:gosec // operator-supplied path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use filepath.Join instead of string concatenation for path construction. This ensures cross-platform compatibility and avoids issues with redundant or missing path separators.

Suggested change
out, err := os.Create(cfg.outputRoot + "/MANIFEST.json") //nolint:gosec // operator-supplied path
out, err := os.Create(filepath.Join(cfg.outputRoot, "MANIFEST.json")) //nolint:gosec // operator-supplied path

Comment thread cmd/elastickv-snapshot-decode/main.go Outdated
if err != nil {
return errors.WithStack(err)
}
defer func() { _ = out.Close() }()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Do not silently ignore errors from Close() methods on resources. Log them to ensure resource leaks or other cleanup problems are visible. Since this is a writer, a failure in Close() could indicate a problem flushing final data to disk, even after a successful Sync().

if err := out.Close(); err != nil {
	log.Errorf("failed to close file: %v", err)
}
References
  1. Do not silently ignore errors from Close() methods on resources like network connections. Log them to ensure resource leaks or other cleanup problems are visible.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c3cd69fa0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

defer func() { _ = in.Close() }()

res, err := backup.DecodeSnapshot(in, backup.DecodeOptions{
OutRoot: cfg.outputRoot,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set a separate scratch root for snapshot decode

The CLI calls DecodeSnapshot without ScratchRoot, which makes the dispatcher default scratch to OutRoot; in the S3 path that becomes <output>/s3, and S3Encoder.Finalize removes that scratch tree. With default adapters (which include S3), any snapshot containing S3 records can lose the emitted s3/ output before CHECKSUMS is written, producing incomplete backups even though decode otherwise succeeds.

Useful? React with 👍 / 👎.

Comment thread internal/backup/checksums.go Outdated
if !ok {
return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i+1, line)
}
got, err := sha256File(filepath.Join(root, rel))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject CHECKSUMS paths that escape the dump root

VerifyChecksums uses filepath.Join(root, rel) on unvalidated paths from CHECKSUMS. A tampered line like ../... is accepted by splitChecksumLine, so verification can read files outside the dump and still report success, which breaks the guarantee that verification only attests to files inside the backup tree.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cmd/elastickv-snapshot-decode/main_test.go (1)

216-216: 💤 Low value

Consider refactoring to remove //nolint:gocritic.

The guideline states to avoid //nolint annotations unless absolutely required and prefer refactoring. The append pattern can be replaced with explicit allocation:

♻️ Proposed refactor
-	encKey := append([]byte(backup.RedisStringPrefix+userKey), make([]byte, snapshotTSSize)...) //nolint:gocritic // explicit append for clarity
+	prefix := backup.RedisStringPrefix + userKey
+	encKey := make([]byte, len(prefix)+snapshotTSSize)
+	copy(encKey, prefix)

As per coding guidelines: "Avoid adding //nolint unless absolutely required and prefer refactoring instead."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-snapshot-decode/main_test.go` at line 216, The test uses append
with a nolint marker when building encKey: replace the append pattern with an
explicit allocation and copy to avoid the //nolint:gocritic; specifically
allocate a byte slice of length
len(backup.RedisStringPrefix)+len(userKey)+snapshotTSSize, copy the prefix
(backup.RedisStringPrefix) and userKey into it, and leave the trailing
snapshotTSSize bytes zeroed so encKey is constructed deterministically without
using append (update the code that sets encKey in the test accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/backup/checksums.go`:
- Around line 150-156: In VerifyChecksums, validate that the parsed rel path
cannot escape root by calling the existing safeJoinUnderRoot(root, rel) instead
of filepath.Join; on any error from safeJoinUnderRoot return
errors.Wrapf(ErrChecksumsMalformedLine, "line %d: %q", i+1, line) (i.e. map the
helper error to ErrChecksumsMalformedLine), and pass the safe absolute path
returned by safeJoinUnderRoot to sha256File. Also add a unit test for
VerifyChecksums that supplies a CHECKSUMS entry with escaping/absolute rel
values (e.g. "../../etc/passwd" or "/etc/passwd") and assert VerifyChecksums
returns ErrChecksumsMalformedLine.

---

Nitpick comments:
In `@cmd/elastickv-snapshot-decode/main_test.go`:
- Line 216: The test uses append with a nolint marker when building encKey:
replace the append pattern with an explicit allocation and copy to avoid the
//nolint:gocritic; specifically allocate a byte slice of length
len(backup.RedisStringPrefix)+len(userKey)+snapshotTSSize, copy the prefix
(backup.RedisStringPrefix) and userKey into it, and leave the trailing
snapshotTSSize bytes zeroed so encKey is constructed deterministically without
using append (update the code that sets encKey in the test accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fbee340-dc70-4747-87bd-7da346cf1d01

📥 Commits

Reviewing files that changed from the base of the PR and between ccae199 and 2c3cd69.

📒 Files selected for processing (6)
  • cmd/elastickv-snapshot-decode/main.go
  • cmd/elastickv-snapshot-decode/main_test.go
  • internal/backup/checksums.go
  • internal/backup/checksums_test.go
  • internal/backup/source.go
  • internal/backup/source_test.go

Comment thread internal/backup/checksums.go Outdated
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 3m 13s —— View job


Code Review: PR #810elastickv-snapshot-decode CLI + CHECKSUMS writer

Todo list

  • Load tools
  • Read all 6 changed files
  • Analyze for security, correctness, performance, concurrency, test coverage
  • Post review findings

Overall this is clean, well-structured work — flag parsing, manifest emission, and source.go are all solid. There are two issues that need fixing before merge (one P0 data-loss bug, one security issue) plus a few minor items.


P0 — Data loss: S3 output is destroyed when ScratchRoot is not set

File: cmd/elastickv-snapshot-decode/main.go:222

DecodeOptions.ScratchRoot is left empty, so it falls back to OutRoot inside newDispatcher. NewS3Encoder(OutRoot, OutRoot) sets s.scratchRoot = filepath.Join(OutRoot, "s3"). At Finalize time (s3.go:539):

defer func() { _ = os.RemoveAll(s.scratchRoot) }()

This removes <OutRoot>/s3/ — the same directory the S3 encoder just wrote all bucket output into. Any dump containing S3 records silently loses every S3 file, but decodeOne returns nil and WriteChecksums happily writes a CHECKSUMS over the now-empty s3/ subtree.

The DecodeOptions godoc even anticipates this: // Defaults to OutRoot when empty; callers backing the CLI's --scratch-root flag override it. The --scratch-root flag is absent from the CLI.

Fix: create an isolated temp dir for scratch before calling DecodeSnapshot and defer its cleanup:

scratch, err := os.MkdirTemp("", "elastickv-snapshot-scratch-*")
if err != nil {
    return errors.Wrap(err, "create scratch dir")
}
defer func() { _ = os.RemoveAll(scratch) }()

res, err := backup.DecodeSnapshot(in, backup.DecodeOptions{
    OutRoot:     cfg.outputRoot,
    ScratchRoot: scratch,
    ...
})

The golden-path test (TestRun_DecodesSimpleRedisStringSnapshot) only uses a Redis entry so it doesn't catch this. A test with an S3 entry and the default adapter set would reproduce it. Fix this →


P1 — Security: path traversal in VerifyChecksums

File: internal/backup/checksums.go:154

got, err := sha256File(filepath.Join(root, rel))

rel comes directly from an attacker-controlled CHECKSUMS file with no validation. A line like:

<any-64-hex-chars>  ../../etc/passwd

passes splitChecksumLine and causes sha256File to read (and hash) /etc/passwd. If the attacker also knows what the hash of that file would be, VerifyChecksums returns nil — attesting to a file outside the dump.

filepath.IsLocal (Go 1.20+, available here) rejects both absolute paths and .. segments cleanly:

if !filepath.IsLocal(rel) {
    return errors.Wrapf(ErrChecksumsMalformedLine, "line %d: path %q escapes dump root", i+1, line)
}

A corresponding test is also missing:

func TestVerifyChecksums_RejectsPathTraversal(t *testing.T) {
    root := t.TempDir()
    // Write a CHECKSUMS with an escaping rel path.
    body := strings.Repeat("a", 64) + "  ../../etc/passwd\n"
    os.WriteFile(filepath.Join(root, CHECKSUMSFilename), []byte(body), 0o600)
    err := VerifyChecksums(root)
    if !errors.Is(err, ErrChecksumsMalformedLine) {
        t.Fatalf("want ErrChecksumsMalformedLine, got %v", err)
    }
}

Note: the safeJoinUnderRoot helper in s3.go (same package) is designed for user-controlled S3 keys and refuses backslashes, NUL bytes, and dot segments — it's too strict for this use case and coupled to ErrS3MalformedKey. filepath.IsLocal is the right tool here. Fix this →


P2 — Minor: filepath.Join for MANIFEST path

File: cmd/elastickv-snapshot-decode/main.go:281

out, err := os.Create(cfg.outputRoot + "/MANIFEST.json")

Should be filepath.Join(cfg.outputRoot, "MANIFEST.json") — matches the style used by WriteChecksums one line below and handles any trailing-separator edge cases.


P2 — Minor: silent Close error on MANIFEST writer

File: cmd/elastickv-snapshot-decode/main.go:285

defer func() { _ = out.Close() }()

The out.Sync() on line 289 guards against most flush failures, but Close errors are still silently discarded. Since this is a CLI tool (not a hot path), logging the error on defer is the right call:

defer func() {
    if cerr := out.Close(); cerr != nil {
        logger.Warn("close MANIFEST.json", "err", cerr)
    }
}()

(The same pattern appears in WriteChecksums:63 — consistency would help here too.)


P2 — Minor: os.ReadFile in VerifyChecksums reads entire file into memory

File: internal/backup/checksums.go:141

For large dump trees with thousands of files, os.ReadFile + strings.Split loads the full CHECKSUMS content into memory and creates multiple allocations. A bufio.Scanner approach processes it line-by-line in O(1) memory (as noted by Gemini's review). This matters most when CHECKSUMS is > a few MB, which happens at ~10k+ files.


P3 — Style: //nolint:gocritic in test

File: cmd/elastickv-snapshot-decode/main_test.go:216

encKey := append([]byte(backup.RedisStringPrefix+userKey), make([]byte, snapshotTSSize)...) //nolint:gocritic // explicit append for clarity

The project guideline says "Avoid //nolint — refactor instead". CodeRabbit's suggested refactor is straightforward:

prefix := backup.RedisStringPrefix + userKey
encKey := make([]byte, len(prefix)+snapshotTSSize)
copy(encKey, prefix)

Summary

Severity Issue File:Line
P0 S3 scratch root = OutRoot → Finalize deletes S3 output main.go:222
P1 VerifyChecksums reads files outside dump root via path traversal checksums.go:154
P2 String concat for MANIFEST path instead of filepath.Join main.go:281
P2 Silent Close error on MANIFEST writer main.go:285
P2 os.ReadFile memory inefficiency in VerifyChecksums checksums.go:141
P3 //nolint:gocritic in test, per project guideline main_test.go:216

The P0 and P1 issues are the priority — both have straightforward fixes and the P0 will silently corrupt any dump that contains S3 data. Everything else (source.go, parseFlags, parseAdapterSet, manifest structure) looks correct.

bootjp added a commit that referenced this pull request May 23, 2026
Three reviewer-flagged issues on PR #810:

1. (codex P1, data loss) The dispatcher defaulted ScratchRoot to
   OutRoot when blank. S3Encoder.Finalize then ran
   RemoveAll(scratchRoot+"/s3") and wiped the emitted S3 dump
   subtree before CHECKSUMS could be written.

   Fix: newDispatcher.ensureS3Scratch now allocates a fresh
   os.MkdirTemp when the caller leaves ScratchRoot blank, parents
   it on the dispatcher, and removes it in finalize via a deferred
   cleanupScratch. Caller-supplied ScratchRoot is unchanged.

   Test: TestDecodeSnapshot_S3OutputSurvivesDefaultScratch — feeds
   a synthetic bucket-meta + manifest + blob through DecodeSnapshot
   with ScratchRoot blank and asserts the assembled S3 body is on
   disk under <OutRoot>/s3/<bucket>/<obj>. With the bug the file
   is wiped and the read fails.

2. (coderabbit critical, path traversal) VerifyChecksums joined
   CHECKSUMS relative paths to root without validating them, so a
   malicious CHECKSUMS line like `<hex>  ../../etc/shadow` would
   direct the verifier to fingerprint files outside the dump root.

   Fix: validateChecksumRelPath rejects absolute paths and any
   path whose filepath.Clean'd form starts with `..` or equals
   `..`/`.`/empty. Returns ErrChecksumsPathTraversal so callers
   can branch on errors.Is.

   Test: TestVerifyChecksums_RejectsTraversalPath — table-driven
   over five attack shapes (parent-traversal, deep-traversal,
   absolute, current-dir, empty); each is rejected before any
   sha256 is computed.

3. (gemini security-high + high, OOM vector) VerifyChecksums used
   os.ReadFile + strings.Split, materialising the whole CHECKSUMS
   in memory. A multi-million-file dump or an adversarial
   newline-free CHECKSUMS could OOM the verifier.

   Fix: switched to bufio.Scanner with a fixed 8 KiB per-line
   budget. Streaming throughout — even with a TB of dump
   artefacts the verifier holds at most one line plus the
   in-flight sha256 hash state.

   Test: TestVerifyChecksums_StreamsLargeFile exercises the
   scanner path with 64 files (a realistic file count for the
   per-line-scan code path; the cost of a literal-large fixture
   in CI is not justified vs. the assertion that the streaming
   API works at all).

Caller audit (CLAUDE.md "semantic-change → audit all callers"):

- VerifyChecksums: only internal/test callers; no production
  call sites in the CLI yet. Path-validation is purely a
  rejection-side tightening; honest CHECKSUMS files are
  unaffected. Streaming semantics are unobservable to callers.
- DecodeSnapshot / newDispatcher: one external caller
  (cmd/elastickv-snapshot-decode/main.go:222), which leaves
  ScratchRoot blank and now gets the safe tempdir behavior.
  No signature change.

Self-review:
1. Data loss — fixed (the S3 wipe was data-loss; tempdir
   isolation prevents it). New temp leaves on hard crash but
   live system temp eviction handles that.
2. Concurrency — no shared state introduced.
3. Performance — streaming verify is strictly better.
4. Data consistency — path traversal closed; encryption-state
   handling unchanged.
5. Test coverage — three regression tests, one per finding.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

Addressed three reviewer findings in 34970f0:

  • Codex P1 (data loss): dispatcher defaulted ScratchRoot to OutRoot; S3Encoder.Finalize then RemoveAll'd <OutRoot>/s3 and wiped the S3 dump before CHECKSUMS could be written. Fix: newDispatcher.ensureS3Scratch allocates a fresh os.MkdirTemp when blank, parents it on the dispatcher, removes it via deferred cleanupScratch in finalize. Caller-supplied ScratchRoot is unchanged. New regression test TestDecodeSnapshot_S3OutputSurvivesDefaultScratch synthesises a bucket-meta + manifest + blob and asserts the assembled body survives.
  • CodeRabbit Critical (path traversal): VerifyChecksums joined CHECKSUMS relative paths to root without validating them, so a hostile CHECKSUMS could direct the verifier outside root. Fix: validateChecksumRelPath rejects absolute paths and any cleaned form starting with ... Returns ErrChecksumsPathTraversal. New table test TestVerifyChecksums_RejectsTraversalPath covers five attack shapes.
  • Gemini security-high + high (OOM vector): switched VerifyChecksums from os.ReadFile + strings.Split to bufio.Scanner with an 8 KiB per-line budget so multi-million-file dumps and adversarial newline-free CHECKSUMS cannot force the verifier to buffer arbitrarily. New test TestVerifyChecksums_StreamsLargeFile exercises the scanner path.

Caller audit per CLAUDE.md (semantic-change → grep all callers):

  • VerifyChecksums: only internal/test callers; no production call site yet. Path-validation is rejection-side tightening; honest CHECKSUMS unaffected. Streaming is unobservable.
  • DecodeSnapshot / newDispatcher: one external caller (cmd/elastickv-snapshot-decode/main.go:222), which leaves ScratchRoot blank and now gets the safe tempdir. No signature change.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 2m 18s —— View job


Code Review: PR #810 — fixes in 34970f00

Todo list

  • Load tools
  • Read changed files (checksums.go, decode.go, tests)
  • Verify P1 fix: S3 scratch root isolation
  • Verify P1 fix: path traversal prevention in VerifyChecksums
  • Verify P2 fix: bufio.Scanner OOM fix
  • Check for any remaining issues or regressions
  • Post final review

The three targeted fixes are all correct. Here's the per-fix analysis, plus a note on what's still open.


Fix 1 — S3 scratch root isolation (Codex P1) ✅

decode.go:259-268 / decode.go:307-312

ensureS3Scratch correctly allocates a fresh os.MkdirTemp when ScratchRoot is blank and parents it on the dispatcher. The order-of-operations in finalize is correct: defer d.cleanupScratch() is registered first, then all adapter Finalize() calls run, so by the time cleanupScratch fires, S3Encoder.Finalize has already run and removed <tmp>/s3/. The subsequent RemoveAll(<tmp>) in cleanupScratch is a harmless no-op on an already-empty dir. One detail to note: s3.go:313 sets s.scratchRoot = filepath.Join(scratchRoot, "s3"), so S3's Finalize removes <tmp>/s3/, not <tmp> itself — so cleanupScratch still has a real job to do (removing the parent temp dir). This is fine.

The regression test TestDecodeSnapshot_S3OutputSurvivesDefaultScratch correctly synthesises a complete bucket-meta + object-manifest + blob sequence and asserts the assembled object body survives at s3/<bucket>/<object> after DecodeSnapshot returns.


Fix 2 — Path traversal in VerifyChecksums (CodeRabbit Critical) ✅

checksums.go:228-252

validateChecksumRelPath is thorough:

  • Empty and . are caught upfront as ErrChecksumsMalformedLine
  • Absolute paths caught by filepath.IsAbs(rel) before Clean
  • filepath.Clean(rel) canonicalises the path, then cleaned == ".." || strings.HasPrefix(cleaned, "../") catches a/../../b../b style traversals
  • Belt-and-braces filepath.IsAbs(cleaned) catches platforms where IsAbs and Clean disagree

The test TestVerifyChecksums_RejectsTraversalPath covers the five important shapes. The dual-error assertion errors.Is(err, ErrChecksumsPathTraversal) || errors.Is(err, ErrChecksumsMalformedLine) is intentional and correct — . and "" surface as malformed (wrong shape), while traversal paths surface as ErrChecksumsPathTraversal. Both signal "do not trust this CHECKSUMS file".

One minor note: the test body for the "empty" case writes "0000... \n" (two spaces then nothing). splitChecksumLine extracts rest[2:] = "" and returns ("0000...", "", true). Then validateChecksumRelPath("") returns ErrChecksumsMalformedLine. This is the right behaviour but the test comment could note that empty relPath goes through the shape-parse-succeeds-but-path-rejects path, not splitChecksumLine returning false. Not a bug.


Fix 3 — bufio.Scanner OOM prevention (Gemini) ✅

checksums.go:26 / checksums.go:172-184

maxChecksumLineLen = 8 * 1024 is well-justified — even a 30-level-deep path at NAME_MAX=255 fits in 8 KiB. sc.Buffer(make([]byte, 0, maxChecksumLineLen), maxChecksumLineLen) correctly sets both the initial slice and the max size, so a corrupt CHECKSUMS without newlines causes bufio.ErrTooLong rather than unbounded allocation.

The refactoring into verifyOneChecksumLine is a clean separation that also helps with the cyclomatic-complexity linter constraint called out in the comment.


Still open from the previous round

These three items were not in scope for 34970f00 and remain as-is in main.go / main_test.go:

Severity Issue Location
P2 cfg.outputRoot + "/MANIFEST.json" — string concat instead of filepath.Join main.go:281
P2 Silent Close error on MANIFEST writer main.go:285
P3 //nolint:gocritic in test helper main_test.go:216

These are low-risk (P2 string concat is cosmetic on POSIX; P2 silent Close is guarded by out.Sync() two lines above; P3 nolint violates project guidelines). Worth cleaning up before merge but not blockers if stacked PRs are landing fast.


Summary: All three targeted fixes (S3 data loss, path traversal, OOM vector) are correctly implemented and covered by regression tests. The three items above remain open from the previous review round.

bootjp added 2 commits May 23, 2026 23:20
Adds the Phase 0a operator-facing binary on top of the dispatcher
landed in PR #806. The CLI reads a .fsm snapshot, dispatches it
through DecodeSnapshot, and emits MANIFEST.json + CHECKSUMS into
the chosen output root.

- cmd/elastickv-snapshot-decode/main.go
  - flag surface: --input, --output, --adapter, --cluster-id,
    --include-incomplete-uploads, --include-orphans,
    --preserve-sqs-visibility, --include-sqs-side-records,
    --rename-collisions, --dynamodb-bundle-mode
  - testable run() entrypoint (argv slice in, error out;
    no os.Exit) so main_test.go can drive the binary directly
  - structured slog warning sink wired into the encoders
  - MANIFEST.json stamped with snapshot_index parsed from filename,
    last_commit_ts from the dispatcher header, exclusion flags
    from CLI, and a non-nil pointer to Adapter{} for every enabled
    adapter so the on-disk shape is already correct when per-scope
    enumeration lands
- cmd/elastickv-snapshot-decode/main_test.go
  - end-to-end synthetic Redis-string snapshot round-trips through
    run(), produces the expected blob file, manifest, and
    CHECKSUMS, and passes VerifyChecksums
  - flag-validation paths: missing --input, unknown adapter, zero
    adapter list, unparseable .fsm filename
  - parseAdapterSet table test
- internal/backup/checksums.go
  - WriteChecksums(root): walks root, computes sha256, writes a
    sha256sum(1)-compatible CHECKSUMS file (two-space separator,
    forward-slash paths, lowercase hex, lexicographic order,
    excludes CHECKSUMS itself)
  - VerifyChecksums(root): re-reads the file and surfaces the
    first mismatch as ErrChecksumMismatch (typed sentinel for the
    Phase 0b encoder self-test path)
- internal/backup/source.go
  - SnapshotIndexFromPath: parses the <uint64>.fsm filename
    convention the live writer emits and the CLI stamps into
    MANIFEST.snapshot_index; soft-fails for off-convention names
    via ErrSnapshotIndexUnparseable
- internal/backup/{checksums,source}_test.go
  - basic listing, self-exclusion, determinism across runs,
    happy-path verify, tamper detection
  - SnapshotIndexFromPath table test covering live-writer layout,
    hand-rolled names, max-uint64, missing suffix, empty stem,
    non-numeric, negative, hex stems

Self-review:
1. Data loss: no — CHECKSUMS is written last so a crash mid-dump
   leaves an unchecked but otherwise intact tree the operator
   can re-checksum; the MkdirAll on outputRoot is idempotent so
   an operator-supplied pre-existing directory is preserved.
2. Concurrency: single-threaded; main is sequential.
3. Performance: sha256 over each output file streams via io.Copy
   so multi-GB dump artifacts pass without buffering. CHECKSUMS
   walk is O(files).
4. Data consistency: MANIFEST.last_commit_ts comes from the
   dispatcher header (now surfaced via ReadSnapshotWithHeader so
   empty-snapshot dumps still populate it); snapshot_index
   parsed from the filename per the documented live-writer
   convention; encryption-state failures propagate from
   DecodeSnapshot as ErrSnapshotEncryptedEntry.
5. Test coverage: five CLI tests (golden path + four flag-
   validation surfaces); seven backup-package tests covering
   the new helpers; existing backup-package tests still pass.
…ning)

Two reviewer-flagged issues on the CHECKSUMS path. The Codex P1
("default ScratchRoot aliased OutRoot") was already fixed on main
by PR #806 follow-ups (c9f23c7 + b9fa4a7 — default to
<OutRoot>/.scratch + reject ScratchRoot==OutRoot), so this commit
ships only the remaining two findings.

1. (coderabbit critical, path traversal) VerifyChecksums joined
   CHECKSUMS relative paths to root without validating them, so a
   malicious CHECKSUMS line like `<hex>  ../../etc/shadow` would
   direct the verifier to fingerprint files outside the dump root.

   Fix: validateChecksumRelPath rejects absolute paths and any
   path whose filepath.Clean'd form starts with `..` or equals
   `..`/`.`/empty. Returns ErrChecksumsPathTraversal so callers
   can branch on errors.Is.

   Test: TestVerifyChecksums_RejectsTraversalPath — table-driven
   over five attack shapes (parent-traversal, deep-traversal,
   absolute, current-dir, empty); each is rejected before any
   sha256 is computed.

2. (gemini security-high + high, OOM vector) VerifyChecksums used
   os.ReadFile + strings.Split, materialising the whole CHECKSUMS
   in memory. A multi-million-file dump or an adversarial
   newline-free CHECKSUMS could OOM the verifier.

   Fix: switched to bufio.Scanner with a fixed 8 KiB per-line
   budget. Streaming throughout — even with a TB of dump
   artefacts the verifier holds at most one line plus the
   in-flight sha256 hash state.

   Test: TestVerifyChecksums_StreamsLargeFile exercises the
   scanner path with 64 files; CI cost of a literal-large
   fixture is not justified vs. the assertion that the streaming
   API works at all.

Caller audit (CLAUDE.md "semantic-change → grep all callers"):

- VerifyChecksums: zero production callers. Test-only consumers.
  Path-validation is rejection-side tightening; honest CHECKSUMS
  unaffected. Streaming semantics unobservable to callers.
- WriteChecksums (unchanged in this commit) has one caller
  (cmd/elastickv-snapshot-decode/main.go:252) — still passes
  through.

Self-review:
1. Data loss — no regression (S3 scratch fix already on main).
2. Concurrency — none.
3. Performance — streaming verify is strictly better.
4. Data consistency — path traversal closed.
5. Test coverage — two regression tests, one per finding.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if err != nil {
return DecodeResult{}, err

P2 Badge Clean up owned scratch dir on snapshot read failures

When ReadSnapshotWithHeader returns an error, DecodeSnapshot exits before calling d.finalize(), but cleanupScratch is only deferred inside finalize; with default S3 scratch allocation this leaves scratchOwned (and any partially written chunk files) behind in /tmp. Repeated failed decodes can accumulate large temp data and leak decoded remnants, because the new temp-scratch lifecycle is not executed on the error path.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/checksums.go Outdated
if err != nil {
return errors.Wrapf(err, "line %d", lineNum)
}
got, err := sha256File(filepath.Join(root, cleaned))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Refuse symlink paths during checksum verification

VerifyChecksums validates .. traversal in the textual path, but then hashes filepath.Join(root, cleaned) via os.Open, which follows symlinks; an untrusted dump can include CHECKSUMS entry safe/file where safe/file is a symlink to a host path outside the backup root, and verification will read/hash that external file. This breaks the intended guarantee that verification is confined to the dump tree and can produce false trust decisions based on external filesystem state.

Useful? React with 👍 / 👎.

@bootjp bootjp force-pushed the backup/snapshot-decode-cmd branch from 34970f0 to b60f468 Compare May 23, 2026 14:22
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

r1 (force-pushed b60f468 after rebase onto main):

  • Codex P1 dropped — already fixed on main by PR backup: snapshot dispatcher (Phase 0a routing layer) #806 follow-ups (c9f23c75 defaults ScratchRoot to <OutRoot>/.scratch; b9fa4a7a rejects ScratchRoot==OutRoot cleaned-equal). My initial r1 tempdir approach was redundant; reverted and rebased.
  • CodeRabbit Critical (path traversal): VerifyChecksums now path-validates every CHECKSUMS line via validateChecksumRelPath — rejects absolute paths and any cleaned form starting with ... Returns ErrChecksumsPathTraversal. New table test covers parent-traversal, deep-traversal, absolute, current-dir, empty.
  • Gemini security-high + high (OOM): switched VerifyChecksums from os.ReadFile + strings.Split to bufio.Scanner with an 8 KiB per-line budget. New test TestVerifyChecksums_StreamsLargeFile exercises the scanner path.

Caller audit: VerifyChecksums has no production callers (test-only); WriteChecksums has one caller (CLI), unchanged in this commit.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 3m 0s —— View job


Code Review: PR #810 — r1 (b60f468)

Todo list

  • Load tools
  • Read current state of all changed files + decode.go
  • Verify path traversal fix (validateChecksumRelPath)
  • Verify OOM fix (bufio.Scanner)
  • Check remaining open items from previous round
  • Check new Codex issues (scratch dir leak, symlink bypass)
  • Post review findings

Verified fixes ✅

Codex P1 (ScratchRoot) — correctly dropped. decode.go:199-208 defaults ScratchRoot to OutRoot/.scratch (never OutRoot itself), and decode.go:216-220 hard-rejects ScratchRoot == OutRoot cleaned-equal as ErrDecodeOptionsInvalid. The CLI at main.go:222 leaves ScratchRoot empty and gets the safe default. ✓

CodeRabbit Critical (path traversal) — correctly fixed. validateChecksumRelPath (checksums.go:228-252) catches all four attack shapes in the right order: empty/.ErrChecksumsMalformedLine; filepath.IsAbs(rel) before Clean; .. or ../ prefix after Clean; belt-and-braces filepath.IsAbs(cleaned) for the edge case where Clean and IsAbs disagree on platform. The five-case table test in checksums_test.go:125-159 pins each arm. ✓

Gemini OOM — correctly fixed. bufio.Scanner with maxChecksumLineLen = 8 KiB at checksums.go:172-184 bounds per-line memory regardless of file size. The 8 KiB cap is justified by the comment (a 30-deep tree of NAME_MAX=255 paths stays under 8 KiB). sc.Buffer call correctly sets both the initial capacity and the hard max. The streaming test at checksums_test.go:167-180 exercises this path. ✓


Codex P2 "scratch dir leak" — incorrect finding

The Codex comment on decode.go:150-151 claims that finalize() is not called when ReadSnapshotWithHeader returns an error. This is wrong. Looking at decode.go:163:

fErr := d.finalize()   // called unconditionally, AFTER ReadSnapshotWithHeader
if err == nil {
    err = fErr
}

The code comment even says "finalize() runs unconditionally so encoders (notably the S3 encoder's scratch tree) get cleanup even when the stream is truncated." The scratch dir (OutRoot/.scratch) is cleaned up on every exit path. This Codex finding can be disregarded.


P1 — Symlink bypass in VerifyChecksums (new, Codex checksums.go:204)

The Codex comment is a real finding. validateChecksumRelPath guards against textual .. traversal, but sha256File(filepath.Join(root, cleaned)) calls os.Open which follows symlinks. A hostile dump package containing a file safe/link that is a symlink to /etc/shadow — with a matching CHECKSUMS line <sha256-of-shadow> safe/link — would pass verification by fingerprinting the external file.

WriteChecksums is safe (it skips non-regular files at checksums.go:102), but VerifyChecksums doesn't enforce the same constraint. The fix is one os.Lstat call in verifyOneChecksumLine before sha256File:

fi, err := os.Lstat(filepath.Join(root, cleaned))
if err != nil {
    return errors.Wrapf(err, "stat %s", cleaned)
}
if fi.Mode()&os.ModeSymlink != 0 {
    return errors.Wrapf(ErrChecksumsMalformedLine, "%s: is a symlink, dump tree should not contain symlinks", cleaned)
}

Threat-model note: exploiting this requires the attacker to control both the dump tree (plant the symlink) and the CHECKSUMS (list it with the correct digest). A dump produced by a stock DecodeSnapshot invocation will never contain symlinks. The risk is real primarily when verifying dumps from untrusted or third-party sources. A test should accompany the fix:

func TestVerifyChecksums_RejectsSymlink(t *testing.T) {
    root := t.TempDir()
    target := t.TempDir()
    mustWrite(t, filepath.Join(target, "secret"), []byte("secret"))
    os.Symlink(filepath.Join(target, "secret"), filepath.Join(root, "link"))
    // Write a CHECKSUMS that lists the symlink with its real content hash.
    h, _ := sha256File(filepath.Join(target, "secret"))
    body := h + "  link\n"
    mustWrite(t, filepath.Join(root, CHECKSUMSFilename), []byte(body))
    if err := VerifyChecksums(root); err == nil {
        t.Fatal("expected error for symlink entry, got nil")
    }
}

Fix this →


Still open from previous round

Severity Issue Location
P2 cfg.outputRoot + "/MANIFEST.json" — string concat instead of filepath.Join main.go:281
P2 defer func() { _ = out.Close() }() — silent Close error; Sync() doesn't fully cover it main.go:285
P3 //nolint:gocritic in test violates "Avoid //nolint — refactor instead" main_test.go:216

The P3 nolint refactor (allocate + copy instead of append) and P2 items are all low-risk; the symlink bypass is the only new substantive finding this round.


Codex r2 P1: after the r1 textual `..`-traversal guard,
VerifyChecksums still followed symlinks when sha256-ing the
joined path. A CHECKSUMS line `<hex>  safe/file` could direct
the verifier to hash a host file outside the dump root via a
planted symlink at `safe/file` (or any intermediate parent).

Fix: refuseSymlinkComponents walks the relative path segment by
segment from root, os.Lstat()s each component, and returns
ErrChecksumsSymlinkEscape the moment any component is a
symlink. Catches BOTH terminal symlinks AND intermediate ones
(a parent directory along the way being symlinked). The
per-component walk is stricter than syscall O_NOFOLLOW (which
only covers the terminal case) and avoids the
filepath.EvalSymlinks resolution we are trying to refuse.

Phase 0a dumps contain only regular files by design, so
refusing every symlink found in the path is the right policy
for the only honest dump shape.

Tests:
- TestVerifyChecksums_RejectsTerminalSymlink — plant a symlink
  at safe/escape pointing at a host file, add a CHECKSUMS line
  for it; assert ErrChecksumsSymlinkEscape.
- TestVerifyChecksums_RejectsIntermediateSymlink — plant a
  symlink at safe/ (a directory link to an outside dir),
  reference safe/target.txt in CHECKSUMS; same assertion.
- Both tests skip on Windows (os.Symlink requires
  SeCreateSymbolicLinkPrivilege there; the lstat-walk fix
  itself runs on every platform).

Caller audit (CLAUDE.md "semantic-change → grep all callers"):
- refuseSymlinkComponents is only called from
  verifyOneChecksumLine, which is only called from
  VerifyChecksums. VerifyChecksums has no production callers
  (test-only); CLI uses WriteChecksums.

Self-review:
1. Data loss — none (read-side hardening).
2. Concurrency — none.
3. Performance — one extra Lstat per path component during
   verify; verify is offline-only and rare.
4. Data consistency — symlink-escape closed.
5. Test coverage — two regression tests (terminal + intermediate
   symlink); existing five-shape traversal-path table test
   still passes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

Addressed codex r2 P1 (symlink escape) in 2bb8cd7:

  • VerifyChecksums symlink escape: after the r1 textual .. guard, os.Open(filepath.Join(root, cleaned)) would still follow a planted symlink at the terminal or any intermediate component. Fix: refuseSymlinkComponents walks the relative path segment-by-segment, os.Lstat's each, and returns ErrChecksumsSymlinkEscape on the first symlink found. Catches both terminal AND intermediate symlinks (stricter than O_NOFOLLOW, which only covers terminal).
  • New regression tests: TestVerifyChecksums_RejectsTerminalSymlink and TestVerifyChecksums_RejectsIntermediateSymlink. Both skip on Windows (requires SeCreateSymbolicLinkPrivilege); the fix itself runs on every platform.

Caller audit: refuseSymlinkComponents is only called from verifyOneChecksumLineVerifyChecksums; the latter has no production callers (test-only). CLI uses WriteChecksums, unchanged.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 1m 27s —— View job


Code Review: PR #810 — r2 (2bb8cd7f, symlink escape fix)

Todo list

  • Load tools
  • Read checksums.go and checksums_test.go (current state)
  • Read main.go (check remaining open items)
  • Verify symlink fix correctness (refuseSymlinkComponents)
  • Check remaining open items from prior rounds
  • Post review findings

Symlink fix — refuseSymlinkComponents

checksums.go:236-253

The implementation is correct and stricter than the alternatives:

  • Splits rel on string(filepath.Separator) after validateChecksumRelPath has already run filepath.Clean, so separators are OS-normalised before the split. On Linux and macOS the CHECKSUMS forward-slash paths round-trip correctly; on Windows filepath.Clean converts them to backslashes, and the split follows. ✓
  • os.Lstat at each path component (not os.Stat) means the check never resolves the symlink itself — it examines the symlink node. ✓
  • Per-component walk catches intermediate symlinks (a directory component that is a symlink pointing outside root), which O_NOFOLLOW or a single os.Lstat(joined) at the end would miss. The comment in the code correctly explains why. ✓
  • seg == "" skip is belt-and-braces and harmless — filepath.Clean would not produce consecutive separators. ✓

checksums_test.go:162-245

TestVerifyChecksums_RejectsTerminalSymlink calls WriteChecksums before planting the symlink (so CHECKSUMS contains only the honest files), then appends a hand-crafted entry for the symlinked path. refuseSymlinkComponents fires before the hash comparison, so the hardcoded digest in the appended line doesn't need to be correct — the test is not testing hash matching, it's testing symlink rejection. The approach is sound. ✓

TestVerifyChecksums_RejectsIntermediateSymlink uses an all-zeros digest (same rationale) for a path through a directory that is a symlink. Also correct. ✓

Both tests skip on Windows with an appropriate reason. ✓


Still open from previous rounds

These three items were explicitly deferred by the author and remain unchanged:

Severity Issue Location
P2 cfg.outputRoot + "/MANIFEST.json" — string concat instead of filepath.Join main.go:281
P2 defer func() { _ = out.Close() }() — silent Close error on MANIFEST writer main.go:285
P3 //nolint:gocritic in test helper main_test.go:216

The P2 silent-Close pattern also appears in WriteChecksums at checksums.go:75 (same defer func() { _ = out.Close() }(), though that one is protected by out.Sync() on line 81 too). Neither is a blocker given the Sync() guard, but they're worth noting for consistency.


Overall

The symlink escape fix is correctly implemented and well-tested. The three remaining items are all low-risk and non-blocking — the security-relevant findings (path traversal, symlink escape, OOM) have all been resolved. This is ready to merge once the stacking dependency (#806) lands.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2bb8cd7fb0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/checksums.go Outdated
Comment on lines +279 to +282
if filepath.IsAbs(rel) {
return "", errors.Wrapf(ErrChecksumsPathTraversal, "absolute path: %q", rel)
}
cleaned := filepath.Clean(rel)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use filepath.IsLocal for CHECKSUMS path validation

validateChecksumRelPath only rejects absolute and ..-prefixed paths, but it still accepts non-local Windows names like CON, CONOUT$, or COM1. During VerifyChecksums, those names are opened via os.Open(filepath.Join(root, cleaned)), which can access device paths outside the backup tree instead of regular files, so verification can succeed/fail based on host device state rather than dump contents. Replace this guard with a locality check (for example filepath.IsLocal on the original/cleaned path) to enforce the intended “confined to dump root” guarantee across platforms.

Useful? React with 👍 / 👎.

…sLocal)

Codex r3 P1: validateChecksumRelPath rejected `..`-traversal and
absolute paths but accepted Windows reserved device names like
CON / NUL / COM1 / CONOUT$. On Windows, os.Open of those names
yields the host's console / device rather than a regular file
under the dump root, so verification can succeed/fail based on
host device state instead of dump contents.

Fix: replace the explicit absolute/traversal checks with
filepath.IsLocal (Go 1.20+), which delegates path-locality to
the stdlib. On Windows IsLocal additionally rejects the
reserved device names; on Unix the names look like ordinary
files (and would be hashed if they exist under root, the
honest dump shape).

Empty / `.` are still explicitly rejected because IsLocal
considers `.` local (it stays within the subtree, but it
names the root directory, which is not a hashable regular
file). The post-Clean recheck below IsLocal is belt-and-braces
against a future stdlib change.

Tests:
- TestValidateChecksumRelPath_RejectsWindowsDeviceNames pins
  the rejection. The reserved-name check is Windows-only by
  filepath.IsLocal's contract, so the test skips on non-Windows;
  the lexical fix itself runs on every platform.
- TestValidateChecksumRelPath_AcceptsHonestPaths cross-checks
  the IsLocal switchover does not regress the five legitimate
  Phase 0a dump-tree shapes (Redis blob, S3 nested object,
  DynamoDB schema, SQS messages.jsonl).

Caller audit (CLAUDE.md "semantic-change → grep all callers"):
- validateChecksumRelPath only called from
  verifyOneChecksumLine → VerifyChecksums; the latter has no
  production callers (test-only). The semantic change is
  purely rejection-side tightening on Windows; honest CHECKSUMS
  unaffected.

Self-review:
1. Data loss — none (read-side hardening).
2. Concurrency — none.
3. Performance — IsLocal is O(path-len); a single pass.
4. Data consistency — Windows device-name escape closed.
5. Test coverage — two new tests; existing five-shape
   traversal-path table test still passes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

Addressed codex r3 P1 in b05ec87:

  • VerifyChecksums Windows reserved names: validateChecksumRelPath previously rejected ..-traversal and absolute paths but accepted CON / NUL / COM1 / CONOUT$. Fix: replaced the explicit checks with filepath.IsLocal (Go 1.20+), which delegates path-locality to the stdlib. On Windows IsLocal additionally rejects reserved device names; on Unix those strings are ordinary filenames and would only be hashed if they exist under root (the honest dump shape).
  • Empty / . still rejected explicitly; the post-Clean recheck below IsLocal is belt-and-braces against a future stdlib loosening.
  • New tests: TestValidateChecksumRelPath_RejectsWindowsDeviceNames (skips on non-Windows since the reserved-name check is platform-specific per IsLocal's contract) and TestValidateChecksumRelPath_AcceptsHonestPaths (cross-checks that the IsLocal switchover does not regress the five legitimate Phase 0a dump-tree shapes).

Caller audit: validateChecksumRelPath is only called from verifyOneChecksumLineVerifyChecksums; the latter has no production callers (test-only).

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 1m 12s —— View job


Code Review: PR #810 — r3 (b05ec878)

Todo list

  • Load tools
  • Read checksums.go and checksums_test.go (current state)
  • Verify r3 fix: filepath.IsLocal replacement in validateChecksumRelPath
  • Check remaining open items from prior rounds
  • Post final review

filepath.IsLocal fix — validateChecksumRelPath

checksums.go:286-306

The fix is correct and well-structured. Reviewing the call chain:

  1. Empty/. caught explicitly upfront — both surface as ErrChecksumsMalformedLine. Correct: filepath.IsLocal(".") returns true (it is local), so the explicit pre-check is necessary to prevent hashing a root directory. ✓
  2. filepath.IsLocal(rel) called on the raw pre-Clean form. This is the right order — Clean could theoretically normalise a dodgy path that IsLocal would catch, so calling on the original is strictly safer. ✓
  3. filepath.Clean(rel) applied after the IsLocal guard to produce the canonical form returned to the caller. ✓
  4. Belt-and-braces ..-prefix and filepath.IsAbs re-checks after Clean — documented as guards against a future IsLocal stdlib loosening. Harmless and the rationale is sound. ✓

Forward-slash interop on Windows: CHECKSUMS entries are written with forward-slash separators (filepath.ToSlash at checksums.go:119). On Windows, filepath.IsLocal accepts forward slashes and correctly converts before reserved-name checks — this is part of the stdlib contract since Go 1.20. ✓


New tests ✅

TestValidateChecksumRelPath_RejectsWindowsDeviceNames (checksums_test.go:261)

  • Windows-only skip is correct — filepath.IsLocal only rejects reserved names on Windows; on Unix CON/NUL/etc. are ordinary filenames that would only fail downstream when the file doesn't exist.
  • Tests validateChecksumRelPath directly (not VerifyChecksums) — the test comment explains why: the os.Lstat step downstream would surface ENOENT on Unix, masking the ErrChecksumsPathTraversal rejection class being pinned. ✓
  • The seven names cover both CON/NUL/PRN/AUX (device streams), COM1/LPT1 (serial/parallel), and CONOUT$ (extended name form). ✓

TestValidateChecksumRelPath_AcceptsHonestPaths (checksums_test.go:288)

  • Five Phase 0a dump-tree paths: plain file, Redis, S3, DynamoDB, SQS. Covers all enabled adapters. ✓
  • Asserts got == filepath.Clean(p) — correctly pins the contract that the function returns the cleaned form. ✓

Still open from previous rounds

These three items were deferred in earlier rounds and remain unchanged:

Severity Issue Location
P2 cfg.outputRoot + "/MANIFEST.json" — string concat instead of filepath.Join main.go:281
P2 defer func() { _ = out.Close() }() — silent Close error; Sync() doesn't fully cover it main.go:285
P3 //nolint:gocritic in test violates "Avoid //nolint — refactor instead" main_test.go:216

None are blockers (the P2 string concat is cosmetic on POSIX; Sync() on line 289 covers the data-flush concern; the P3 nolint is in a test helper). Worth cleaning before merge but not holding up the security-relevant work.


Summary: The r3 fix is clean. filepath.IsLocal is the right delegation point for Windows reserved-name rejection. The call-order (raw rel before Clean) and belt-and-braces post-Clean guards are both correct. The two new tests are well-targeted and the Windows-only skip is properly justified. No new issues. All security-relevant findings across r1–r3 are now resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b05ec8784c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/checksums.go Outdated
}
defer func() { _ = out.Close() }()
for _, e := range entries {
if _, err := fmt.Fprintf(out, "%s %s\n", e.hex, e.relPath); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape CHECKSUMS paths before writing line-based output

WriteChecksums writes each filename verbatim into a newline-delimited CHECKSUMS line ("%s %s\n"), but dump paths can legally contain newline characters (for example, S3 object keys are accepted without filtering \n), so one file can be serialized as multiple lines and make VerifyChecksums/sha256sum -c parse a corrupted manifest. In that case the backup becomes unverifiable even though file contents are intact; paths need sha256sum-compatible escaping (or an explicit rejection of newline-bearing names) before emitting the line format.

Useful? React with 👍 / 👎.

Three remaining reviewer findings on PR #810 (one P2, two
mediums) that the prior rounds did not address. None is a
fail-closed semantic change; together they tighten the
CHECKSUMS on-disk format and the cmd's manifest write path
against operational shapes the design promises to support.

1. (codex r4 P2) WriteChecksums wrote filenames verbatim into a
   newline-delimited file. A path containing `\n`, `\r`, or `\\`
   would corrupt the line-based output and let a single
   adversarial entry inject a second CHECKSUMS row that the
   verifier would accept.

   Fix: formatChecksumLine implements the GNU coreutils
   sha256sum(1) escape convention — leading `\` on the line
   plus `\n` / `\r` / `\\` substitutions. splitChecksumLine
   reverses the convention via unescapeChecksumPath at parse
   time. `sha256sum -c CHECKSUMS` from the dump root parses
   both forms identically, preserving the vendor-independent
   recovery property.

   Tests:
   - TestChecksumLine_EscapeRoundTrip — five paths covering
     `\n`, `\r`, `\\`, multiple escapes mixed, and the
     no-escape passthrough.
   - TestChecksumLine_EscapePrefixMatchesGNUFormat — pins the
     exact `\` + hex + `  ` + escaped-body shape so a future
     change cannot drift from sha256sum compatibility.
   - TestSplitChecksumLine_RejectsDanglingEscape — guards
     against a tampered CHECKSUMS using `\<EOL>` or `\x` for
     unknown `x`.

2. (gemini r1 medium, main.go:281) emitManifest concatenated
   `cfg.outputRoot + "/MANIFEST.json"`. On Windows this
   produces wrong path separators and would refuse a
   Windows-style output root.

   Fix: filepath.Join. Standard cross-platform shape.

3. (gemini r1 medium, main.go:285) emitManifest deferred a
   `_ = out.Close()` that swallowed the Close error. On NFS /
   FUSE filesystems Close is the authoritative durability
   signal (writeback errors surface there, not at Write time),
   so a silently-discarded Close error means the dump-tree
   invariant (MANIFEST.json on disk) can fail without the cmd
   surfacing it.

   Fix: explicit Sync() then Close() with both errors
   propagated; the defer'd close on error paths runs through
   `_ = out.Close()` because we have already surfaced the
   primary failure.

Caller audit:
- formatChecksumLine / unescapeChecksumPath are new helpers
  called only from WriteChecksums / splitChecksumLine. No
  cross-package use.
- splitChecksumLine's exported behavior is rejection-side
  tightening (dangling escape) plus correct round-trip of
  escape-needing paths. No production caller of VerifyChecksums
  exists (test-only).
- emitManifest is internal to cmd/elastickv-snapshot-decode.
  The Close-error surface is purely additive — previously-
  successful runs stay successful.

Self-review:
1. Data loss — none; the new escape path PREVENTS data loss
   (line injection); the Close-error surface PREVENTS silent
   manifest-write loss on NFS.
2. Concurrency — none.
3. Performance — single Builder per escape-needing line; the
   no-escape fast path is unchanged.
4. Data consistency — sha256sum(1) escape round-trip is
   byte-identical (round-trip test pins it).
5. Test coverage — three new tests; existing seven CHECKSUMS
   tests still pass.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

Addressed three additional reviewer findings (codex P2 + 2 gemini mediums) in ec358ab:

  • Codex r4 P2 (WriteChecksums line-injection): WriteChecksums wrote filenames verbatim into a newline-delimited file. A path containing \n, \r, or \\ would corrupt the line-based output. Fix: formatChecksumLine implements GNU coreutils sha256sum(1) escape convention (leading \ + \n/\r/\\ substitutions); splitChecksumLine reverses it via unescapeChecksumPath at parse time. sha256sum -c CHECKSUMS parses both forms identically.
  • Gemini r1 medium (main.go:281): switched cfg.outputRoot + "/MANIFEST.json" to filepath.Join for cross-platform correctness.
  • Gemini r1 medium (main.go:285): emitManifest no longer swallows out.Close(). Explicit Sync() then Close() with both errors propagated — required for NFS/FUSE where Close is the authoritative durability signal.

Three new tests pin the round-trip and the parse-side guards (TestChecksumLine_EscapeRoundTrip, TestChecksumLine_EscapePrefixMatchesGNUFormat, TestSplitChecksumLine_RejectsDanglingEscape).

Caller audit: formatChecksumLine / unescapeChecksumPath are new helpers (only called from WriteChecksums / splitChecksumLine); splitChecksumLine is internal; emitManifest is internal to the cmd.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 2m 42s —— View job


Code Review: PR #810 — r4 (ec358ab9)

Todo list

  • Load tools
  • Read checksums.go (escape/unescape logic + formatChecksumLine / unescapeChecksumPath)
  • Read checksums_test.go (new round-trip and edge-case tests)
  • Read main.go (filepath.Join + Close fix)
  • Verify formatChecksumLine / unescapeChecksumPath correctness against GNU sha256sum(1) spec
  • Verify splitChecksumLine parse correctness with escaped paths
  • Check remaining open items (P3 nolint)
  • Post review findings

Fix 1 — WriteChecksums line-injection (Codex r4 P2) ✅

checksums.go:98-117 / checksums.go:362-421

The escape/unescape implementation is correct. Traced through the full cycle:

  • formatChecksumLine(hex, "weird\nfile") → sb escapes \n to two-char sequence \ + n, returns (hex + " weird\nfile", "\") where \n is two chars. Prefix is the single-char \. The fmt.Fprintf(out, "%s%s\n", prefix, line) writes \<hex> weird\nfile as one line terminated by the real 0x0A. ✓
  • The bufio.Scanner reads exactly that one line (stops at 0x0A). ✓
  • splitChecksumLine: strings.HasPrefix(line, "\\") fires, strips the prefix byte, extracts body "weird\nfile" (two-char \n), calls unescapeChecksumPath. ✓
  • unescapeChecksumPath("weird\nfile"): on \ increments i, reads n, writes real 0x0A. Returns the original string. ✓

Edge cases verified:

  • Backslash-only path (\): formats as \<hex> \\, unescapes back to \. ✓
  • Unknown escape (\x): unescapeChecksumPath returns (_, false)splitChecksumLine returns false. ✓
  • Dangling backslash (trailing \): i >= len(body) guard fires, returns (_, false). ✓
  • No-escape pass-through: returns empty prefix, no unescape path touched. ✓

The three new tests (TestChecksumLine_EscapeRoundTrip, TestChecksumLine_EscapePrefixMatchesGNUFormat, TestSplitChecksumLine_RejectsDanglingEscape) pin all four arms correctly.


Fix 2 — filepath.Join for MANIFEST path ✅

main.go:284

out, err := os.Create(filepath.Join(cfg.outputRoot, "MANIFEST.json"))

Correct. ✓


Fix 3 — Silent Close error on MANIFEST writer ✅

main.go:299-310

emitManifest now uses the correct pattern: WriteManifest error → Close (discarded) + return; Sync error → Close (discarded) + return; explicit Close with error propagated. All three early-exit paths surface errors cleanly. The pattern mirrors what the project's storage layer does for write-then-sync durability. ✓


P3 — Two //nolint directives remain

checksums.go:103 (new in r4):

sb.Grow(len(relPath) + 4) //nolint:mnd // small heuristic to avoid early grow

Per project guidelines: "Avoid //nolint — refactor instead." This can be replaced with a named constant:

const escapeGrowPad = 4 // one escape sequence adds at most one extra byte, plus prefix
sb.Grow(len(relPath) + escapeGrowPad)

main_test.go:216 (carried over from prior rounds — still open):

encKey := append([]byte(backup.RedisStringPrefix+userKey), make([]byte, snapshotTSSize)...) //nolint:gocritic

Same refactor as before — make + copy over append. Not a blocker since it's in a test helper.


Minor doc note — "binary mode" wording in WriteChecksums

checksums.go:52

Each line is <hex> <relative-path>\n (two spaces, "binary" mode in sha256sum vocabulary).

Two-space separator is sha256sum's text mode; binary mode uses *. This is a documentation mistake only — the emitted format (two spaces) is correct and sha256sum -c CHECKSUMS accepts it. Worth fixing before the doc is read by a future restore-tool author who checks against the sha256sum(1) man page.


Summary

Severity Issue Location Status
✅ Fixed WriteChecksums newline-injection via unescaped filenames checksums.go r4
✅ Fixed filepath.Join for MANIFEST path main.go:284 r4
✅ Fixed Silent Close error on MANIFEST writer main.go:285 r4
P3 //nolint:mnd in formatChecksumLine (new) checksums.go:103 Open
P3 //nolint:gocritic in test helper main_test.go:216 Open (prior)
P3 Doc: "binary mode" should be "text mode" in WriteChecksums checksums.go:52 Open

All security-relevant findings across r1–r4 are resolved. The two //nolint violations and the doc wording are the only remaining items — none are blockers. This is ready to merge once #806 lands and the stacking dependency is resolved.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/backup/checksums_test.go (1)

107-117: ⚡ Quick win

Pin ErrChecksumMismatch in the tampering test.

This currently passes on any non-nil error, so a regression to a generic parse or I/O failure would still look green.

Suggested change
-	if err := VerifyChecksums(root); err == nil {
-		t.Fatalf("expected VerifyChecksums to detect tampering, got nil")
-	}
+	err := VerifyChecksums(root)
+	if !errors.Is(err, ErrChecksumMismatch) {
+		t.Fatalf("err = %v, want ErrChecksumMismatch", err)
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/checksums_test.go` around lines 107 - 117, Update
TestVerifyChecksums_DetectsTampering to assert the returned error specifically
indicates a checksum mismatch rather than any non-nil error: after calling
VerifyChecksums(root) replace the generic nil check with an errors.Is(err,
ErrChecksumMismatch) (or equivalent comparison) and fail the test if that check
is false; reference the test name TestVerifyChecksums_DetectsTampering, the
VerifyChecksums call, and the ErrChecksumMismatch sentinel so the test only
passes when VerifyChecksums returns the expected checksum-mismatch error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/backup/checksums.go`:
- Around line 71-82: The WriteChecksums function currently defers out.Close()
and only returns the error from out.Sync(), which discards Close() failures;
replace the deferred close with explicit handling: remove the defer, call
syncErr := out.Sync() after writing, then call closeErr := out.Close(), and
return a wrapped/combined error that surfaces closeErr when present (e.g., if
closeErr != nil return errors.WithStack(fmt.Errorf("%v; close error: %w",
syncErr, closeErr)) else return errors.WithStack(syncErr)). Refer to the file
writer variable out and the existing out.Sync() call when making this change.

---

Nitpick comments:
In `@internal/backup/checksums_test.go`:
- Around line 107-117: Update TestVerifyChecksums_DetectsTampering to assert the
returned error specifically indicates a checksum mismatch rather than any
non-nil error: after calling VerifyChecksums(root) replace the generic nil check
with an errors.Is(err, ErrChecksumMismatch) (or equivalent comparison) and fail
the test if that check is false; reference the test name
TestVerifyChecksums_DetectsTampering, the VerifyChecksums call, and the
ErrChecksumMismatch sentinel so the test only passes when VerifyChecksums
returns the expected checksum-mismatch error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9ec94dd-3a71-43ec-b607-205c3f1e43fb

📥 Commits

Reviewing files that changed from the base of the PR and between 2c3cd69 and ec358ab.

📒 Files selected for processing (6)
  • cmd/elastickv-snapshot-decode/main.go
  • cmd/elastickv-snapshot-decode/main_test.go
  • internal/backup/checksums.go
  • internal/backup/checksums_test.go
  • internal/backup/source.go
  • internal/backup/source_test.go

Comment thread internal/backup/checksums.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec358ab9ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// decodeOne runs the dispatcher against the configured input and
// writes MANIFEST.json + CHECKSUMS into the output tree.
func decodeOne(cfg *config, logger *slog.Logger) error {
if err := os.MkdirAll(cfg.outputRoot, defaultOutputDirMode); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refuse non-empty output roots before decoding

decodeOne only calls os.MkdirAll and then writes new adapter files into whatever is already under --output. Because WriteChecksums later hashes all regular files under that root, rerunning the CLI into a previously used directory can silently include stale files from an older snapshot in the new CHECKSUMS and dump tree. This makes the resulting backup represent a mixed state rather than the requested snapshot; consider failing fast when --output is non-empty (or cleaning adapter subtrees first).

Useful? React with 👍 / 👎.

Coderabbit r5 Major (checksums.go:82): WriteChecksums returned
out.Sync() but discarded the deferred out.Close() error. Same
shape as the r4 gemini medium on emitManifest, applied here too.
On NFS / some FUSE backends Close is where buffered-writeback
failures surface, so a swallowed Close means the CLI can declare
success on a truncated or unreadable CHECKSUMS file.

Fix: explicit Sync() then Close() with both errors propagated,
with the pre-existing defer'd Close swallowed only on the
error-return paths (the primary failure has already been
surfaced). Extracted writeAllChecksumLines so the parent stays
a thin open / write / sync / close skeleton with the durability
contract spelled out.

Caller audit (CLAUDE.md "semantic-change → grep all callers"):
- WriteChecksums has one caller
  (cmd/elastickv-snapshot-decode/main.go:253), which already
  surfaces the returned error verbatim via errors.Wrap. The
  semantic tightening (Close errors now surface) is honored
  by the existing wrapping; no caller change needed.

Also addresses coderabbit r5 nitpick: typed-sentinel assertion
in TestVerifyChecksums_DetectsTampering (was checking err !=
nil generically; now pins ErrChecksumMismatch so a future
change that turns the tamper signal into a different error
class surfaces as a test failure).

Self-review:
1. Data loss — fix prevents silent CHECKSUMS-truncation on
   NFS / FUSE.
2. Concurrency — none.
3. Performance — no new I/O; just surfaces an existing error.
4. Data consistency — durability contract now honest.
5. Test coverage — typed-sentinel assertion added; existing
   round-trip tests still pass.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 23, 2026

@claude review

Addressed coderabbit r5 Major + nitpick in 7a8e2f3:

  • Coderabbit Major (checksums.go:82 — Close error swallowed): WriteChecksums returned only Sync()'s error; the deferred Close() was discarded. Fix: explicit Sync() then Close() with both errors propagated, same pattern as r4's emitManifest fix. Extracted writeAllChecksumLines so WriteChecksums stays a thin open/write/sync/close skeleton.
  • Coderabbit nitpick (TestVerifyChecksums_DetectsTampering): typed-sentinel assertion added — now pins ErrChecksumMismatch instead of any non-nil error, so a future change that turns tamper detection into a different error class surfaces as a test failure.

Caller audit: WriteChecksums has one caller (cmd/elastickv-snapshot-decode/main.go:253), which already surfaces the returned error verbatim via errors.Wrap. The semantic tightening (Close errors now surface) is honored by the existing wrapping.

@claude
Copy link
Copy Markdown

claude Bot commented May 23, 2026

Claude finished @bootjp's task in 1m 50s —— View job


Code Review: PR #810 — r5 (7a8e2f30)

Todo list

  • Load tools
  • Read checksums.go and checksums_test.go (current state)
  • Verify r5 fix: WriteChecksums Close error propagation
  • Verify r5 nitpick fix: TestVerifyChecksums_DetectsTampering sentinel assertion
  • Check for remaining open items
  • Post review findings

Fix 1 — WriteChecksums Close error propagation (CodeRabbit Major) ✅

checksums.go:63-109

The extraction to writeAllChecksumLines and the explicit Sync+Close lifecycle are both correct. Tracing all exit paths from WriteChecksums:

  • collectChecksumEntries errors → return early, no file opened. ✓
  • os.Create errors → return early, no file opened. ✓
  • writeAllChecksumLines errors → _ = out.Close() (correct to discard: primary error is the write failure) + return. ✓
  • out.Sync() errors → _ = out.Close() (same rationale) + return. ✓
  • Happy path → out.Close() error is the return value, surfaces to caller. ✓

writeAllChecksumLines(w io.Writer, ...) takes an io.Writer rather than *os.File, which is the right abstraction — the loop owns nothing about durability, the skeleton above it owns the open/sync/close contract. Clean separation.

The comment block at checksums.go:83-90 correctly explains why Close() failures matter on NFS/FUSE backends. Good rationale to leave in.


Fix 2 — TestVerifyChecksums_DetectsTampering typed sentinel (CodeRabbit nitpick) ✅

checksums_test.go:111-126

err := VerifyChecksums(root)
if err == nil {
    t.Fatalf("expected VerifyChecksums to detect tampering, got nil")
}
if !errors.Is(err, ErrChecksumMismatch) {
    t.Fatalf("err = %v, want ErrChecksumMismatch", err)
}

Dual-check is the right pattern here: the nil guard gives a clean "no error at all" message, and the errors.Is check pins the specific sentinel. A future refactor that accidentally returns ErrChecksumsMalformedLine from the tamper path will now surface as a test failure. ✓


Still open from prior rounds

Severity Issue Location Status
P2 Non-empty --output dir mixes stale files into new CHECKSUMS (Codex r4) main.go:214 Open
P3 //nolint:mnd in formatChecksumLine — named constant would fix checksums.go:129 Open (r4)
P3 //nolint:gocritic in test helper main_test.go:216 Open (early rounds)
P3 Doc: two-space separator is text mode, not "binary" mode in sha256sum(1) checksums.go:49 Open (r4)

The P2 non-empty-output-root finding (Codex r4) is worth noting: if the operator reruns decodeOne into an existing --output directory, WriteChecksums will hash all regular files found there — including stale files from a prior decode run — and produce a CHECKSUMS that attests to a mixed-snapshot tree. The fix would be a preflight check that rejects a non-empty outputRoot (or documents "caller must use a fresh directory"). Not a blocker if the CLI docs/--help call this out, but it's easy to misuse silently.

The three P3 items are cosmetic. The security-relevant findings across all five rounds are resolved. This is ready to merge once #806 lands.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a8e2f308e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/checksums.go Outdated
Comment on lines +255 to +256
if line == "" {
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail verification when CHECKSUMS has no valid entries

verifyOneChecksumLine treats blank lines as success, and VerifyChecksums never enforces that at least one checksum row was parsed. As a result, a truncated/corrupted CHECKSUMS file containing only empty lines returns nil, so integrity verification can report success without checking any file content. This diverges from sha256sum -c semantics for malformed/empty manifests and can hide backup corruption.

Useful? React with 👍 / 👎.

Comment on lines +305 to +306
if err != nil {
return errors.WithStack(err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify missing checksum targets as checksum failures

When a listed file is missing, refuseSymlinkComponents returns the raw os.Lstat error, so callers cannot detect it via errors.Is(err, ErrChecksumMismatch) and it bypasses the verifier's typed checksum-failure path. In practice, partial restores or deleted files produce an unclassified path error instead of a checksum verification failure, which breaks error handling that keys on checksum result types.

Useful? React with 👍 / 👎.

…sification)

Two codex P2 findings on r5 (commit 7a8e2f3):

1. (codex P2, checksums.go:256) VerifyChecksums returned nil on
   a CHECKSUMS file with zero parsed rows (empty file, blank
   lines only). Phase 0a dumps are non-empty by construction
   (WriteChecksums lists every regular file), so a zero-row
   CHECKSUMS is always a producer-side corruption signal, never
   a benign shape. The verifier was hiding partial-write or
   truncation failures behind a "no rows checked == OK" answer.

   Fix: VerifyChecksums now counts verified rows and returns
   ErrChecksumsEmpty when the count is zero. Blank lines still
   skip silently (matching sha256sum -c's tolerance of trailing
   newlines) but do not satisfy the at-least-one-row guard.

   Test: TestVerifyChecksums_RejectsEmptyChecksumsFile covers
   three corruption shapes (empty file, blank lines only,
   trailing-newline only).

2. (codex P2, checksums.go:306) refuseSymlinkComponents returned
   raw fs.ErrNotExist when a CHECKSUMS-listed file was missing
   from disk. Callers branching on errors.Is(err,
   ErrChecksumMismatch) missed this case — partial restores and
   tamper-deletes produced an unclassified path error instead.

   Fix: detect fs.ErrNotExist at the refuseSymlinkComponents
   call site (and at sha256File's call site as belt-and-braces)
   and promote to ErrChecksumMismatch. The design's contract
   ("Files referenced by CHECKSUMS but missing on disk surface
   as the same error class so a partial-restore is obvious") is
   now honored.

   Test: TestVerifyChecksums_MissingTargetIsMismatch — writes
   a CHECKSUMS row, deletes the listed file, asserts
   ErrChecksumMismatch via errors.Is.

API shape: verifyOneChecksumLine returns (bool, error) so the
parent loop can count "real" verified rows separately from
blank-line skips. Internal helper; signature change does not
escape the package.

Caller audit:
- verifyOneChecksumLine is internal to checksums.go; the
  signature change is contained.
- VerifyChecksums has no production callers yet (test-only).
  The new ErrChecksumsEmpty and the promoted ErrChecksumMismatch
  shapes are additive on the typed-sentinel surface; honest
  CHECKSUMS still verify cleanly.

Self-review:
1. Data loss — none; both fixes catch silent-corruption signals.
2. Concurrency — none.
3. Performance — one int counter per file; sha256 streaming
   unchanged.
4. Data consistency — empty-CHECKSUMS hole closed; missing-
   target now classified.
5. Test coverage — two new tests; existing thirteen CHECKSUMS
   tests still pass.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 24, 2026

@claude review

Addressed two codex r5 P2 findings in 2822f2d:

  • Codex P2 (checksums.go:256 — empty CHECKSUMS hidden): VerifyChecksums returned nil on a CHECKSUMS file with zero parsed rows. Phase 0a dumps are non-empty by construction, so an empty CHECKSUMS is always a producer-side corruption signal. Fix: count verified rows and return new typed sentinel ErrChecksumsEmpty when count is zero. Blank-line tolerance preserved (sha256sum -c shape) but does not satisfy the at-least-one-row guard.
  • Codex P2 (checksums.go:306 — missing target unclassified): refuseSymlinkComponents returned raw fs.ErrNotExist when a listed file was missing. Callers branching on errors.Is(err, ErrChecksumMismatch) missed partial-restore / delete cases. Fix: detect fs.ErrNotExist at the call sites and promote to ErrChecksumMismatch, honoring the design's "missing surfaces as same error class" contract.
  • verifyOneChecksumLine signature changed to (bool, error) so the parent counts real rows separately from blank-line skips (internal helper, contained).

Two new regression tests: TestVerifyChecksums_RejectsEmptyChecksumsFile (three corruption shapes) and TestVerifyChecksums_MissingTargetIsMismatch.

Caller audit: verifyOneChecksumLine is internal to checksums.go; VerifyChecksums has no production callers (test-only). Both new typed sentinels are additive.

@claude
Copy link
Copy Markdown

claude Bot commented May 24, 2026

Claude finished @bootjp's task in 1m 54s —— View job


Code Review: PR #810 — r6 (2822f2db)

Todo list

  • Load tools
  • Read checksums.go and checksums_test.go (current state)
  • Verify Fix 1: ErrChecksumsEmpty guard on zero-row CHECKSUMS
  • Verify Fix 2: ErrNotExistErrChecksumMismatch promotion
  • Verify verifyOneChecksumLine signature change
  • Check for remaining open items
  • Post review findings

Both r6 fixes are correct. Detailed analysis below.


Fix 1 — ErrChecksumsEmpty on zero-row CHECKSUMS ✅

checksums.go:228-261

The implementation is correct. VerifyChecksums counts verified separately from lineNum; blank-line skips (verifyOneChecksumLine returns (false, nil)) don't increment the counter. The three scanner paths through the empty-file guard:

  • Empty file (""): sc.Scan() never fires → verified == 0ErrChecksumsEmpty. ✓
  • Blank-lines only ("\n\n\n"): three scans of "", each returns (false, nil)verified == 0ErrChecksumsEmpty. ✓
  • Malformed lines: verifyOneChecksumLine returns (false, ErrChecksumsMalformedLine)VerifyChecksums returns immediately without reaching the verified == 0 guard. Correct — a CHECKSUMS with malformed lines surfaces the specific parse error, not the generic empty-guard. ✓

One style note: ErrChecksumsEmpty is declared at checksums.go:263-268, after VerifyChecksums which uses it at line 258. Go is order-independent within a package so this compiles and runs correctly; just slightly unusual to read. Not a bug.

The new test TestVerifyChecksums_RejectsEmptyChecksumsFile covers all three corruption shapes with the correct errors.Is(err, ErrChecksumsEmpty) assertion. ✓


Fix 2 — ErrNotExistErrChecksumMismatch promotion ✅

checksums.go:297-328

The promotion is correctly applied at two distinct call sites:

After refuseSymlinkComponents (checksums.go:305-308): os.Lstat returns fs.ErrNotExist for a missing file — this propagates out of refuseSymlinkComponents as errors.WithStack(err) and is caught here. Correctly promoted.

After sha256File (checksums.go:319-322): This guards the TOCTOU race — if the file was deleted between the os.Lstat in refuseSymlinkComponents and the os.Open in sha256File. The duplication is intentional and the comment explains it. Not dead code.

The comment at checksums.go:298-304 correctly documents that a missing intermediate directory also surfaces as fs.ErrNotExist from refuseSymlinkComponents — for example, rel = "a/b/c.txt" where a/b/ was deleted. Treating this as ErrChecksumMismatch is the right classification: the listed file can't exist, which is operationally the same as tamper or partial-restore.

The new test TestVerifyChecksums_MissingTargetIsMismatch correctly exercises this path. ✓


verifyOneChecksumLine signature change ✅

checksums.go:284

Changed from error to (bool, error). The bool signals "this was a real checksum row" (as opposed to a blank-line skip). The parent loop uses it correctly:

got, err := verifyOneChecksumLine(root, sc.Text(), lineNum)
if err != nil {
    return err
}
if got {
    verified++
}

The function is package-private (verifyOneChecksumLine unexported), so the signature change is fully contained. Caller audit: only one call site, in VerifyChecksums. ✓


Still open from prior rounds

Severity Issue Location Status
P3 //nolint:mnd in formatChecksumLine — named constant would fix checksums.go:130 Open (r4)
P3 //nolint:gocritic in test helper main_test.go:216 Open (early rounds)
P3 Doc: two-space separator is text mode, not "binary" mode in sha256sum(1) checksums.go:50 Open (r4)
P2 Non-empty --output dir mixes stale files into new CHECKSUMS main.go:214 Open (r5)

The P3 items are cosmetic and the P2 stale-output-root finding is operator-ergonomics (not a correctness bug — the CHECKSUMS is accurate for whatever files are present; it just attests to a mixed tree). None are blockers.


Summary: All security-relevant findings across r1–r6 are resolved. The two ErrNotExist promotion sites are both necessary (primary path + TOCTOU belt-and-braces). The ErrChecksumsEmpty guard correctly distinguishes "zero verified rows" from "some malformed rows" by exiting early on parse errors before reaching the empty guard. Ready to merge once #806 lands.

@bootjp bootjp merged commit fde4ee3 into main May 24, 2026
9 of 11 checks passed
@bootjp bootjp deleted the backup/snapshot-decode-cmd branch May 24, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant