Skip to content

[Raft] Add Raft persistence#1470

Merged
barroco merged 1 commit into
interuss:masterfrom
Orbitalize:add_raft_persistence
Jun 8, 2026
Merged

[Raft] Add Raft persistence#1470
barroco merged 1 commit into
interuss:masterfrom
Orbitalize:add_raft_persistence

Conversation

@MariemBaccari

@MariemBaccari MariemBaccari commented May 20, 2026

Copy link
Copy Markdown
Contributor

This PR is part of the chain #1470 -> #1473 -> #1474 to address the issue #1463.
It adds WAL and snapshot persistency and management for the raftstore. Contrary to the diagrams from the issue #1463, we embed *Raft.MemoryStorage in the storage instead of setting it as a field in Consensus to avoid scattering the storage management.

Comment thread pkg/raftstore/consensus/storage.go Outdated

@the-glu the-glu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread pkg/raftstore/consensus/storage.go Outdated
Comment thread pkg/raftstore/consensus/storage.go Outdated
Comment thread pkg/raftstore/consensus/storage.go Outdated
Comment thread pkg/raftstore/consensus/storage.go Outdated
Comment thread pkg/raftstore/params/params.go Outdated
return stacktrace.Propagate(err, "failed to save snapshot to wal")
}

return s.wal.ReleaseLockTo(snapshot.Metadata.Index)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on where the lock was acquired

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
And does the lock need to be released when an error is returned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are os level flock locks on WAL segment files that are acquired internally in the WAL implementation. ReleaseLockTo is used to release the locks for the entries that are already covered by the snapshot and, thus, the files are not needed anymore.
We shouldn't release the locks when an error is returned since the files are still needed if we fail to save the snapshot.
I realized, while looking into etcd's source code, that this call is actually useless in the current state of our implementation. Etcd uses this as a way to mark files that can be cleaned up (separately). I will keep it for now and add a comment to the function and the github issue to implement the cleanup later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include this rationale in the comment in code :)

@barroco

barroco commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Thank you @MariemBaccari, please find some suggestions inline.

Comment thread pkg/raftstore/consensus/consensus.go Outdated
Comment thread pkg/raftstore/params/params.go Outdated
func init() {
flag.Uint64Var(&connectParameters.ID, "raft_node_id", 0, "raft node ID for this instance (must be non-zero and unique within the cluster)")
flag.StringVar(&connectParameters.Peers, "raft_peers", "", `comma-separated "nodeID=peerURL" pairs for all cluster members, including the current node, e.g. "1=http://node1:9021,2=http://node2:9021,3=http://node3:9021"`)
flag.StringVar(&connectParameters.DataDir, "raft_data_directory", defaultDataDir, "directory for raft data (snapshot and WAL storage)")

@mickmis mickmis Jun 2, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever we write stuff to the local filesystem that becomes a something the operations need to deal with. So somehow, somewhere, we need documentation about how to handle the data in that directory. Is it temp data that can be trashed? If trashed how long does it take to reconstruct it? If not temp, does it need to be backed up? What happens if it is lost? How large can it become? Can it be hot swapped? Is there any impact from the underlying storage (e.g. if it is too slow?) etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. I suspect snapshotCatchUpEntriesN has an impact on the size of file on filesytem: have this configurable and document it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified the raft_datadir flag description, do you think that's sufficient for the moment ? I also added a cleanup task in #1463. I think the implementation of that will come with more documentation regarding the growth of the data directory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clarified the raft_datadir flag description, do you think that's sufficient for the moment ? I also added a cleanup task in #1463. I think the implementation of that will come with more documentation regarding the growth of the data directory.

This should be part of the user documentation - so not in the code necessarily (although code comment is useful too). I'd say as long as it is tracked that it is to be included in the documentation we are OK. I can't find on #1463 though?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had added the "Old files cleanup" task to "future changes". I just specified the user documentation part now.

Comment thread pkg/raftstore/consensus/storage.go Outdated
@MariemBaccari MariemBaccari force-pushed the add_raft_persistence branch 2 times, most recently from 62208b4 to 610ad5f Compare June 2, 2026 14:21
@MariemBaccari MariemBaccari requested review from barroco and mickmis June 2, 2026 14:36

@mickmis mickmis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if already suggested by @barroco, but using https://pkg.go.dev/go.etcd.io/raft/v3@v3.6.0/rafttest for testing would be quite important (not necessarily in this PR, but important to do in general)

Comment thread pkg/raftstore/consensus/storage.go Outdated
return nil, false, stacktrace.Propagate(err, "failed to create directory for wal storage at: %s", walPath)
}

w, err := wal.Create(logger, walPath, nil)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w is redefined here. If that is intended, why declaring var w above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I did not intend to redefine it.

return stacktrace.Propagate(err, "failed to save snapshot to wal")
}

return s.wal.ReleaseLockTo(snapshot.Metadata.Index)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include this rationale in the comment in code :)

Comment thread pkg/raftstore/consensus/storage.go Outdated
}

func loadSnapshot(logger *zap.Logger, walPath string, snapshotter *snap.Snapshotter) (*raftpb.Snapshot, error) {
if !wal.Exist(walPath) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is done again right after the only call to loadSnapshot: looks like the control flow could be clarified a bit. Is the intent to load a snapshot only if the wal does not exist? If so, what about removing this here and calling loadSnapshot in an else of if !ok?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot is loaded only if the wal does already exist but yes it's better to move the load in an else, the Exist call is redundant. Will fix this.

}

// getSnapshot calls all registered snapshot providers and combines their data into a single snapshot.
func (s *storage) getSnapshot() ([]byte, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fits json.Marshaler interface: https://pkg.go.dev/encoding/json#Marshaler

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true but I think we should keep this method signature as it clearly indicates the specific purpose of getting a snapshot of the storage. The underlying implementation can always change and just happens to be a json marshalling for the moment.

return stacktrace.Propagate(err, "failed to save WAL entries")
}

if !raft.IsEmptySnap(snapshot) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Control flow: is it actually necessary to check for this condition twice?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we cannot change the ordering of saving the snapshot -> saving the entries -> applying, I think we have to check for this condition twice.

// newStorage initializes the storage by loading the latest snapshot and wal entries from the disk
// and applies them to the Raft memory storage.
// It returns the initialized storage, a boolean indicating whether the storage was pre-existent or an error.
func newStorage(ctx context.Context, logger *zap.Logger, dataDir string, nodeID uint64, snapshotCatchUpEntries uint64) (*storage, bool, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And an additional clarification: is this storage OK in a cluster environment where there are multiple DSS pods? Is this the intent of the nodeID or not at all?

@MariemBaccari MariemBaccari Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the intent of nodeID :) I can run multiple nodes locally safely.

@MariemBaccari MariemBaccari force-pushed the add_raft_persistence branch from 610ad5f to 78db271 Compare June 8, 2026 06:52
@MariemBaccari MariemBaccari force-pushed the add_raft_persistence branch from 78db271 to 090d9a6 Compare June 8, 2026 07:18
@barroco barroco merged commit 67e337c into interuss:master Jun 8, 2026
12 checks passed
@barroco barroco deleted the add_raft_persistence branch June 8, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants