Add per-entity top-k query support by zipdoki · Pull Request #383 · kakao/actionbase

zipdoki · 2026-06-14T21:39:45Z

Summary

Introduces the engine foundation for per-entity top-k queries (#369).

Score entries use a composite source key "{source}:{topk_name}", allowing all ranking variants to share a single EDGE table while remaining independently scannable per user per dimension. Top-k reads scan the score index in O(K).

The background job flow is: read aggregated counts from EdgeGroup via multiEdgeCount, then upsert into the score table. The engine automatically maintains the score index on each upsert, so rankings stay sorted without additional work.

Group metadata - declare a topk target on any group to opt in:

{
"groups": [
  {
    "group": "_count",
    "type": "COUNT",
    "fields": [
      {
        "name": "_target"
      }
    ],
    "directionType": "OUT",
    "ttl": 9223372036854776000,
    "topk": "top_purchased"
  },
  {
    "group": "_count_1y",
    "type": "COUNT",
    "fields": [
      {
        "name": "_target"
      },
      {
        "name": "day",
        "bucket": {
          "type": "date",
          "unit": "MILLISECOND",
          "timezone": "+09:00",
          "format": "yyyy-MM-dd"
        }
      }
    ],
    "directionType": "OUT",
    "ttl": 31536000000,
    "topk": "top_purchased_1y"
  }
]
}

Score table - a plain EDGE table with a score DESC index:

{
  "table": "_{table}_score",
  "schema": {
    "type": "EDGE",
    "source": {
      "type": "STRING",
      "comment": "{user}:{topk_name}"
    },
    "target": {
      "type": "STRING",
      "comment": "item_id"
    },
    "properties": [
      {
        "name": "score",
        "type": "LONG",
        "nullable": false
      }
    ],
    "direction": "OUT",
    "indexes": [
      {
        "name": "score",
        "fields": [
          {
            "field": "score",
            "order": "DESC"
          }
        ]
      }
    ]
  }
}

Top-k query

$ curl -X GET "/graph/v3/databases/{database}/tables/_{table}_score/edges/top-k/{topk}?start=user_A&direction=OUT&limit=10"

{
  "edges": [
    {
      "source": "user_A",
      "target": "item_X",
      "properties": {
        "score": 42
      }
    },
    {
      "source": "user_A",
      "target": "item_Y",
      "properties": {
        "score": 31
      }
    }
  ]
}

Changes

Add topk field to Group to declare which score table a group feeds
Add multiEdgeCount API — aggregated (source, target) pair count with optional time-window ranges
Add topk API — scans a score-indexed EDGE table in O(K) using composite source key "{source}:{topk_name}"
Add PerEntityTopKSpec covering metadata definition, background job simulation, and top-k query

How to Test

AI Assistance

This PR was written largely with AI assistance.
- Tool / model:

Introduces the topk API that serves pre-ranked item lists per entity in O(K) by scanning a score index. The score table (EDGE with score DESC index) is kept in sync by a background job that reads aggregated counts from EdgeGroup and upserts composite-keyed score entries. Group gains a topk field to declare which score table a group feeds, enabling the background job to discover targets automatically

em3s · 2026-06-15T01:56:43Z

@zipdoki Great start — looks like a lot of thought went into this. 👏

This is a complex flow, so before going into code-level details, I'd like to align on whether the current direction can support the scenarios we want to build.

Example scenario: "Top 10 most-purchased items over the past year (365-day rolling window)", where user_A purchases 12 items (Product_A ~ Product_L). Assume sweep → ingestion → query happens once a day.

2025-06-01
  sweep:     nothing to expire
  ingestion: A +10, B +9, C +8, D +7, E +6, F +5, G +4, H +3, I +2, J +1
  query:     [A:10, B:9, C:8, D:7, E:6, F:5, G:4, H:3, I:2, J:1]

2025-06-02
  sweep:     nothing to expire
  ingestion: J +15, I +10
  query:     [J:16, I:12, A:10, B:9, C:8, D:7, E:6, F:5, G:4, H:3]

2025-06-03
  sweep:     nothing to expire
  ingestion: N/A
  query:     [J:16, I:12, A:10, B:9, C:8, D:7, E:6, F:5, G:4, H:3]

2025-06-04
  sweep:     nothing to expire
  ingestion: A +5, K +4, L +3
  query:     [J:16, A:15, I:12, B:9, C:8, D:7, E:6, F:5, G:4, H:3]
             (K:4, L:3 fall outside the top 10)

... (2025-06-05 ~ 2026-05-31: no activity, nothing for sweep to expire)

2026-06-01
  sweep:     expire 2025-06-01 → A: 15→5, B~H: all 0, I: 12→10, J: 16→15
  ingestion: N/A
  query:     [J:15, I:10, A:5, K:4, L:3]
             (B~H drop to 0; K, L get promoted into the top 10)

2026-06-02
  sweep:     expire 2025-06-02 → J: 15→0, I: 10→0
  ingestion: N/A
  query:     [A:5, K:4, L:3]

2026-06-03
  sweep:     nothing to expire (no activity on 2025-06-03)
  ingestion: N/A
  query:     [A:5, K:4, L:3]

2026-06-04
  sweep:     expire 2025-06-04 → A: 5→0, K: 4→0, L: 3→0
  ingestion: N/A
  query:     []

zipdoki self-assigned this Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add per-entity top-k query support#383

Add per-entity top-k query support#383
zipdoki wants to merge 1 commit into
feat/multi-edge-count-by-aggfrom
feat/per-entity-top-k

zipdoki commented Jun 14, 2026

Uh oh!

em3s commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

zipdoki commented Jun 14, 2026

Summary

Changes

How to Test

AI Assistance

Uh oh!

em3s commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

em3s commented Jun 15, 2026 •

edited

Loading