You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lifting homeobject onto HomeStore v8 (off Folly) — migration plan
Hints captured right after doing the same lift for homeblocks (reference implementation:
homeblocks commit c43fa2d, "Remove Folly and redesign the public API onto the v8 coroutine stack").
homeobject is the same Folly baseline, so most of it is mechanical reuse — but it has three areas
homeblocks didn't, called out under The hard parts.
Baseline → target
now
target
homestore
^7.5.2 (folly futures)
^8 (sisl::async coroutines / std::expected)
sisl
^13.2
^14.6
iomgr
transitive (also #include <iomgr/iomgr.hpp> in gc_manager)
^13, make it an explicit requires
nuraft_mesg
transitive via homestore
^5 transitive (peer_id_t/replica_id_t/group_id_t come from here; expect renames)
C++ std
20
23 (coroutines)
futures
Folly
gone
Build all four (sisl/iomgr/nuraft_mesg/homestore) editable from ~/dev/oss, same as the homeblocks lift.
The reusable playbook (mechanical — identical to homeblocks)
Apply these 1:1. The full type/API dictionary lives in the homeblocks memory note folly-to-v8-migration-dict; the short version:
delete — coroutines need no executor; the start/detach shim uses exec::inline_scheduler
folly::ConcurrentHashMap
pick one: std::unordered_map + std::shared_mutex (what homeblocks did), or a sisl concurrent map. The mem-backend shard/blob maps + the GC blob index use it.
real casts; bare Clock → sisl::Clock; MetricsGroupWrapper → MetricsGroup; ReportFormat::kTextFormat → TEXT_FORMAT
homestore v8 renames you'll hit everywhere: ReplDev→repl_dev, ReplDevListener→repl_dev_listener, ReplApplication→repl_application, BlkId→blk_id, MultiBlkId→multi_blk_id, AsyncReplResult→async_status, ReplResult→result, headers .h→.hpp. And the truthiness flip: alloc_blks(...) now returns status where a value means success — invert the old if (result) =
error checks.
Keep an internal coro_helpers.hpp (sync_get / detach) like homeblocks — do not publish it.
The hard parts (homeobject-specific — design, not sed)
1. Real multi-member raft listener — replication_state_machine.{hpp,cpp} (critical path)
homeblocks used a solo repl_dev; its listener was a stub. homeobject's ReplicationStateMachine is a
full repl_dev_listener: on_commit / on_pre_commit / on_rollback / on_error / on_fetch_data / on_snapshot / create_snapshot / on_start_replace_member / on_remove_member / notify_committed_lsn / ....
The v8 repl_dev_listener interface changed (snake_case + signature churn) — re-derive every override
against the v8 header, don't assume. The async ones:
on_fetch_data(...) → folly::Future<std::error_code> → the v8 task type.
The repl_result_ctx<T>::promise_ (folly::Promise) → value_awaitable — this is exactly the
homeblocks on_write / repl_result_ctx pattern (commit thread calls .complete(ok()), the async_alloc_write coroutine co_awaits it). Lift it verbatim.
This blocks all PG/shard/blob ops — do it first after the public API.
2. Snapshot / baseline-resync — none of this existed in homeblocks
create_snapshot / read_snapshot_obj / write_snapshot_obj + SnapshotReceiveHandler + pg_blob_iterator return Folly futures / AsyncReplResult and walk the blob index. Convert to task;
the iterator becomes a coroutine. Budget real time here — it's the second-heaviest area after the managers.
3. GC manager — Folly executors in the work path (the trickiest non-listener piece)
gc_manager runs on two folly::IOThreadPoolExecutor pools (normal + emergent) + a folly::MPMCQueue + collectAllUnsafe over per-chunk tasks. This is executor use in the actual work path (homeblocks's vol_gc was a single reactor timer — nothing like it). Plan:
Pools → iomgr reactors (run_on / run_on_forget on worker reactors) or a sisl thread pool; per-chunk
fan-out → when_all.
MPMCQueue → a sisl/std concurrent queue.
Deadlock watch: GC calls blocking-ish control-plane ops (data_service alloc/read/free). If GC
coroutines run on reactors and sync_get blocking work that itself needs reactors, you hit the
sync_get-on-reactor deadlock (homeblocks memory sync-get-on-reactor-deadlock). Either co_await, or keep
the blocking waits off reactors.
4. Error type — keep {code, current_leader}; do NOT flatten to std::error_condition
homeblocks adopted homestore's std::error_condition because VolumeError had no payload. homeobject's ShardError/BlobError carry std::optional<peer_id_t> current_leader (NOT_LEADER client redirect) —
that must survive. Recommended shape:
Manager surface stays its own type: Result<T> = std::expected<T, ShardError> (resp. BlobError, PGError), AsyncResult<T> = sisl::async::task<Result<T>>.
At the homestore boundary, co_await homestore's task<result<T>> (error = std::error_condition) and translate to ShardError/BlobError, filling current_leader from repl_dev->get_leader_id().
So only the coroutine type (SemiFuture→task) and the std::expected substrate change; homeobject's public
error stays richer than homestore's. This is the main place the homeblocks recipe does not apply.
5. HTTP admin — Pistache → sisl httplib
hs_http_manager (trigger_gc / snapshot / membership / metrics) is on Pistache; homeblocks's HTTP was
small iomgr→sisl httplib. Port handlers to sisl's http_server (httplib) as homeblocks did; the collectAllUnsafe / .via(InlineExecutor) inside handlers → when_all / direct co_await.
6. CP callbacks — hs_cp_callbacks.cpp
MyCPCallbacks::cp_flush returns folly::Future<bool> → sisl::async::task<bool> (mirror homestore
cp_mgr's cp_start_flush). on_switchover_cp / cp_cleanup / cp_progress_percent are sync. homeblocks
didn't implement CPCallbacks (used homestore's) — new but small surface.
Suggested sequence (bottom-up; keep it building green at each step)
conanfile: bump deps (table above), C++23, editable. Get it configuring before touching code.
Public API (common.hpp + pg/shard/blob headers): Manager<E> aliases → task / std::expected; keep the {code, leader} error structs (Remove error-prone code in bugs with macro. #4). This is the contract — defer its exact shape to the
owner, like homeblocks did.
homestore_backend core: hs_homeobject init (iomgr_params, format_and_start) → the repl_application → the replication_state_machine (Migrate mocks #1) → the three managers (co_await repl_dev::async_alloc_write / async_read; repl_result_ctx Promise → value_awaitable).
index_kv (IndexTable wrappers): put/get/query/remove → status; destroy() is async — co_await
the forced CP flush, and don't run it on a reactor you then block (the homeblocks IndexTable::destroy
deadlock lesson).
Tests (fixture_app): drop folly::Init; .get() → sync_get; collectAll → when_all; build
the config the new init takes.
Drop Folly from conanfile + any find_package/includes; grep clean.
Gotchas (carry straight over from homeblocks)
sync_get/sync_wait is safe only OFF a reactor — it parks the reactor's iomgr loop. Never sync_get a CP-flush- or repl-awaiting op on a worker reactor; co_await it. homeobject's GC and
destroy paths are the risk areas. (homeblocks memory: sync-get-on-reactor-deadlock.)
exec::task is lazy — an un-driven task never runs; std::ignore = task is a silent no-op. co_await / sync_get / detach it, and mark async entry points [[nodiscard]].
Keep buffers/sg_lists alive across co_await (frame-owned), like homeblocks's sgs_keepalive.
alloc_blks truthiness flipped — a value now means success.
uintptr_cast expansion trap: when expanding the removed sisl cast macros, uintptr_cast(p) is reinterpret_cast<uint8_t*>(p), notstatic_cast/<uint32_t*> — homeblocks had 5 latent pointer bugs
from a bad expansion (superblock chunk-id pointers). Grep every uintptr_cast site.
Coroutine purity: co_await/co_return only; don't re-invert with a detach_then-style callback.
Effort (from the survey; one engineer)
Area
Effort
Public API (Manager<E>, 3 manager headers)
2–3 d
Memory backend
1–2 d
replication_state_machine (critical path)
2–3 d
HS BlobManager (heaviest file)
3–4 d
HS ShardManager
2 d
HS PGManager
2–3 d
GC manager (executors → reactors/pool)
3–4 d
Snapshot / baseline-resync
(folded into resync work)
HTTP manager (Pistache → httplib)
1–2 d
CP callbacks
1 d
Tests & fixtures
2–3 d
~20–27 person-days. Critical path: Public API → replication_state_machine → BlobManager.
Lifting homeobject onto HomeStore v8 (off Folly) — migration plan
Hints captured right after doing the same lift for homeblocks (reference implementation:
homeblocks commit
c43fa2d, "Remove Folly and redesign the public API onto the v8 coroutine stack").homeobject is the same Folly baseline, so most of it is mechanical reuse — but it has three areas
homeblocks didn't, called out under The hard parts.
Baseline → target
^7.5.2(folly futures)^8(sisl::async coroutines / std::expected)^13.2^14.6#include <iomgr/iomgr.hpp>in gc_manager)^13, make it an explicitrequires^5transitive (peer_id_t/replica_id_t/group_id_t come from here; expect renames)Build all four (sisl/iomgr/nuraft_mesg/homestore) editable from
~/dev/oss, same as the homeblocks lift.The reusable playbook (mechanical — identical to homeblocks)
Apply these 1:1. The full type/API dictionary lives in the homeblocks memory note
folly-to-v8-migration-dict; the short version:Manager<E>::AsyncResult<T> = folly::SemiFuture<folly::Expected<T,E>>task<std::expected<T,E>>(keep the error type — see hard-part #4)folly::Future<X>/folly::SemiFuture<X>sisl::async::task<X>folly::Promise<T>(inrepl_result_ctx<T>)sisl::async::value_awaitable<T>—.complete()on the commit thread,co_awaitin the issuing coroutinefolly::collectAllUnsafe(vec)/collectAllsisl::async::when_all(std::vector<task<…>>)folly::makeFuture/makeSemiFuture(x)co_return x(or a ready task)folly::makeUnexpected(e)co_return std::unexpected(e)folly::Unit/NullResultstd::monostate/ astatus-style alias.get()(tests, control plane)detail::sync_get(...)— off-reactor onlyfolly::Initfolly::InlineExecutor/.via(...)exec::inline_schedulerfolly::ConcurrentHashMapstd::unordered_map+std::shared_mutex(what homeblocks did), or a sisl concurrent map. The mem-backend shard/blob maps + the GC blob index use it.folly::MPMCQueue(GC)stdconcurrent queue — see hard-part #3folly::Uri(endpoint parse)boost::urlsor a 5-line hand-rollr_cast/s_cast/uintptr_cast/...(sisl v14 removed them)Clock→sisl::Clock;MetricsGroupWrapper→MetricsGroup;ReportFormat::kTextFormat→TEXT_FORMAThomestore v8 renames you'll hit everywhere:
ReplDev→repl_dev,ReplDevListener→repl_dev_listener,ReplApplication→repl_application,BlkId→blk_id,MultiBlkId→multi_blk_id,AsyncReplResult→async_status,ReplResult→result, headers.h→.hpp. And the truthiness flip:alloc_blks(...)now returnsstatuswhere a value means success — invert the oldif (result)=error checks.
Keep an internal
coro_helpers.hpp(sync_get/detach) like homeblocks — do not publish it.The hard parts (homeobject-specific — design, not sed)
1. Real multi-member raft listener —
replication_state_machine.{hpp,cpp}(critical path)homeblocks used a solo repl_dev; its listener was a stub. homeobject's
ReplicationStateMachineis afull
repl_dev_listener:on_commit / on_pre_commit / on_rollback / on_error / on_fetch_data / on_snapshot / create_snapshot / on_start_replace_member / on_remove_member / notify_committed_lsn / ....The v8
repl_dev_listenerinterface changed (snake_case + signature churn) — re-derive every overrideagainst the v8 header, don't assume. The async ones:
on_fetch_data(...)→folly::Future<std::error_code>→ the v8 task type.create_snapshot(...)→AsyncReplResult<>→async_status(co_return ok()).repl_result_ctx<T>::promise_(folly::Promise) →value_awaitable— this is exactly thehomeblocks
on_write/repl_result_ctxpattern (commit thread calls.complete(ok()), theasync_alloc_writecoroutineco_awaits it). Lift it verbatim.This blocks all PG/shard/blob ops — do it first after the public API.
2. Snapshot / baseline-resync — none of this existed in homeblocks
create_snapshot/read_snapshot_obj/write_snapshot_obj+SnapshotReceiveHandler+pg_blob_iteratorreturn Folly futures /AsyncReplResultand walk the blob index. Convert totask;the iterator becomes a coroutine. Budget real time here — it's the second-heaviest area after the managers.
3. GC manager — Folly executors in the work path (the trickiest non-listener piece)
gc_managerruns on twofolly::IOThreadPoolExecutorpools (normal + emergent) + afolly::MPMCQueue+collectAllUnsafeover per-chunk tasks. This is executor use in the actual work path (homeblocks'svol_gcwas a single reactor timer — nothing like it). Plan:run_on/run_on_forgeton worker reactors) or a sisl thread pool; per-chunkfan-out →
when_all.MPMCQueue→ a sisl/stdconcurrent queue.coroutines run on reactors and
sync_getblocking work that itself needs reactors, you hit thesync_get-on-reactor deadlock (homeblocks memory
sync-get-on-reactor-deadlock). Either co_await, or keepthe blocking waits off reactors.
4. Error type — keep
{code, current_leader}; do NOT flatten tostd::error_conditionhomeblocks adopted homestore's
std::error_conditionbecauseVolumeErrorhad no payload. homeobject'sShardError/BlobErrorcarrystd::optional<peer_id_t> current_leader(NOT_LEADER client redirect) —that must survive. Recommended shape:
Result<T> = std::expected<T, ShardError>(resp.BlobError,PGError),AsyncResult<T> = sisl::async::task<Result<T>>.co_awaithomestore'stask<result<T>>(error =std::error_condition) andtranslate to
ShardError/BlobError, fillingcurrent_leaderfromrepl_dev->get_leader_id().error stays richer than homestore's. This is the main place the homeblocks recipe does not apply.
5. HTTP admin — Pistache → sisl httplib
hs_http_manager(trigger_gc / snapshot / membership / metrics) is on Pistache; homeblocks's HTTP wassmall iomgr→sisl httplib. Port handlers to sisl's
http_server(httplib) as homeblocks did; thecollectAllUnsafe/.via(InlineExecutor)inside handlers →when_all/ directco_await.6. CP callbacks —
hs_cp_callbacks.cppMyCPCallbacks::cp_flushreturnsfolly::Future<bool>→sisl::async::task<bool>(mirror homestorecp_mgr's
cp_start_flush).on_switchover_cp/cp_cleanup/cp_progress_percentare sync. homeblocksdidn't implement
CPCallbacks(used homestore's) — new but small surface.Suggested sequence (bottom-up; keep it building green at each step)
common.hpp+ pg/shard/blob headers):Manager<E>aliases →task/std::expected;keep the
{code, leader}error structs (Remove error-prone code in bugs with macro. #4). This is the contract — defer its exact shape to theowner, like homeblocks did.
makeSemiFuture/makeUnexpected→co_return/std::unexpected;ConcurrentHashMap→unordered_map+shared_mutex.hs_homeobjectinit (iomgr_params,format_and_start) → therepl_application→ the replication_state_machine (Migrate mocks #1) → the three managers (co_awaitrepl_dev::async_alloc_write/async_read;repl_result_ctxPromise → value_awaitable).status;destroy()is async — co_awaitthe forced CP flush, and don't run it on a reactor you then block (the homeblocks IndexTable::destroy
deadlock lesson).
fixture_app): dropfolly::Init;.get()→sync_get;collectAll→when_all; buildthe config the new
inittakes.find_package/includes; grep clean.Gotchas (carry straight over from homeblocks)
sync_get/sync_waitis safe only OFF a reactor — it parks the reactor's iomgr loop. Neversync_geta CP-flush- or repl-awaiting op on a worker reactor;co_awaitit. homeobject's GC anddestroy paths are the risk areas. (homeblocks memory:
sync-get-on-reactor-deadlock.)exec::taskis lazy — an un-driven task never runs;std::ignore = taskis a silent no-op.co_await/sync_get/detachit, and mark async entry points[[nodiscard]].co_await(frame-owned), like homeblocks'ssgs_keepalive.alloc_blkstruthiness flipped — a value now means success.uintptr_castexpansion trap: when expanding the removed sisl cast macros,uintptr_cast(p)isreinterpret_cast<uint8_t*>(p), notstatic_cast/<uint32_t*>— homeblocks had 5 latent pointer bugsfrom a bad expansion (superblock chunk-id pointers). Grep every
uintptr_castsite.co_await/co_returnonly; don't re-invert with adetach_then-style callback.Effort (from the survey; one engineer)
Manager<E>, 3 manager headers)~20–27 person-days. Critical path: Public API → replication_state_machine → BlobManager.