fix(phlower): collapse hourly_workers Counter to worker group#28
Merged
Conversation
…ng memory leak TaskAggregate.hourly_workers was keyed on the full Celery hostname, which in K8s embeds a per-pod hash. Rolling deploys and karpenter scaling churn pod names constantly, so each hourly Counter accumulated hundreds of unique string keys that never collapsed back down. With aggregate_retention_hours=168 the structure grew linearly for ~5 days before tripping the 6 GB pod limit — matches the observed OOM cadence. Switch the aggregate keys to worker_group (already derived in events.py and stored on InvocationRecord) so the Counter has a tiny, stable set of keys instead. SQLite invocation rows still carry the full hostname for per-invocation drill-in; only the aggregate collapses. Recovery path was leaking the same way — _flush_counts replays the SQLite worker column verbatim. Apply extract_worker_group there too, otherwise every restart would rehydrate the leak from history. Bump SNAPSHOT_VERSION so existing on-disk hourly_workers blobs are discarded and aggregates get rebuilt clean from row replay. Pyroscope memory profiling was discussed alongside this fix but the pyroscope-io Python client only ships CPU sampling — not memory. Leaving it out; tracemalloc/memray would be a separate change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TaskAggregate.hourly_workerswas keyed on the full Celery hostname, which under K8s embeds a per-pod hash. With rolling deploys churning pod names, each hourly Counter accumulated hundreds of unique string keys that never collapsed back down. Withaggregate_retention_hours=168(7d), the structure grew linearly for ~5 days until the pod hit its memory limit and got OOMKilled — then started over.Fix: aggregate by
worker_group(the stable name derived inevents.pyand already stored onInvocationRecord) instead of the raw hostname. The Counter now has a tiny bounded set of keys. SQLite invocation rows still carry the full hostname so per-invocation drill-in is unchanged.The recovery path was leaking the same way —
_flush_countswas replaying the SQLiteworkercolumn verbatim. Group resolution now happens in_load_countswhile building the batch, so the regex runs outside the store lock instead of blocking live ingest.SNAPSHOT_VERSIONbumped to 2 so existing on-diskhourly_workersblobs are discarded and aggregates rebuild clean from row replay.Test plan