Add per-entity top-k query support#383
Draft
zipdoki wants to merge 1 commit into
Draft
Conversation
Introduces the topk API that serves pre-ranked item lists per entity in O(K) by scanning a score index. The score table (EDGE with score DESC index) is kept in sync by a background job that reads aggregated counts from EdgeGroup and upserts composite-keyed score entries. Group gains a topk field to declare which score table a group feeds, enabling the background job to discover targets automatically
Contributor
|
@zipdoki Great start — looks like a lot of thought went into this. 👏 This is a complex flow, so before going into code-level details, I'd like to align on whether the current direction can support the scenarios we want to build. Example scenario: "Top 10 most-purchased items over the past year (365-day rolling window)", where |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces the engine foundation for per-entity top-k queries (#369).
Score entries use a composite source key
"{source}:{topk_name}", allowing all ranking variants to share a single EDGE table while remaining independently scannable per user per dimension. Top-k reads scan the score index in O(K).The background job flow is: read aggregated counts from EdgeGroup via
multiEdgeCount, then upsert into the score table. The engine automatically maintains the score index on each upsert, so rankings stay sorted without additional work.Group metadata - declare a
topktarget on any group to opt in:{ "groups": [ { "group": "_count", "type": "COUNT", "fields": [ { "name": "_target" } ], "directionType": "OUT", "ttl": 9223372036854776000, "topk": "top_purchased" }, { "group": "_count_1y", "type": "COUNT", "fields": [ { "name": "_target" }, { "name": "day", "bucket": { "type": "date", "unit": "MILLISECOND", "timezone": "+09:00", "format": "yyyy-MM-dd" } } ], "directionType": "OUT", "ttl": 31536000000, "topk": "top_purchased_1y" } ] }Score table - a plain EDGE table with a score DESC index:
{ "table": "_{table}_score", "schema": { "type": "EDGE", "source": { "type": "STRING", "comment": "{user}:{topk_name}" }, "target": { "type": "STRING", "comment": "item_id" }, "properties": [ { "name": "score", "type": "LONG", "nullable": false } ], "direction": "OUT", "indexes": [ { "name": "score", "fields": [ { "field": "score", "order": "DESC" } ] } ] } }Top-k query
Changes
topkfield toGroupto declare which score table a group feedsmultiEdgeCountAPI — aggregated (source, target) pair count with optional time-window rangestopkAPI — scans a score-indexed EDGE table in O(K) using composite source key"{source}:{topk_name}"PerEntityTopKSpeccovering metadata definition, background job simulation, and top-k queryHow to Test
AI Assistance