Skip to content

perf(adapter/megatron): do sync save in main thread, not separate process#107

Draft
g-husam wants to merge 4 commits intomainfrom
feature/sync-save-optim
Draft

perf(adapter/megatron): do sync save in main thread, not separate process#107
g-husam wants to merge 4 commits intomainfrom
feature/sync-save-optim

Conversation

@g-husam
Copy link
Copy Markdown
Collaborator

@g-husam g-husam commented May 7, 2026

The default sync save impl in Megatron's AsyncSaveShardedStrategy, which is the save() method, simply calls async_save(...), schedules it (which kicks off a worker process to do the work), then blocks and waits for that process to finish, and cleans it up if it is not a persistent process.

This adds unnecessary overhead when doing synchronous saving, as all that work can just be done in the main thread, which is simpler. This change does just that.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Python Code Coverage Summary

Code Coverage

Package Line Rate Branch Rate Health
src.ml_flashpoint 100% 100%
src.ml_flashpoint.adapter 100% 100%
src.ml_flashpoint.adapter.megatron 97% 97%
src.ml_flashpoint.adapter.nemo 98% 93%
src.ml_flashpoint.adapter.pytorch 99% 92%
src.ml_flashpoint.checkpoint_object_manager 93% 93%
src.ml_flashpoint.core 95% 92%
src.ml_flashpoint.replication 83% 83%
Summary 95% (2413 / 2540) 92% (586 / 638)

Minimum allowed line rate is 90%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

C++ Code Coverage Summary

Code Coverage

Package Line Rate Branch Rate Health
src.ml_flashpoint.checkpoint_object_manager.buffer_object 93% 54%
src.ml_flashpoint.replication.transfer_service 79% 41%
Summary 82% (899 / 1097) 44% (670 / 1539)

Minimum allowed line rate is 80%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant