Skip to content
This repository was archived by the owner on Dec 25, 2023. It is now read-only.
This repository was archived by the owner on Dec 25, 2023. It is now read-only.

How to run OR subqueries concurrently #297

@leomao10

Description

@leomao10

In our project, we got query like this:

TagItem.query(TagItem.template == template.key, TagItem.owner_type == owner_type).filter(TagItem.owner_uid._IN(owner_uids))

And the performance become worse when we owner_uids become bigger, after some profiling, we found that it is because we make datastore query sequentially for each owner_uid.

Screen Shot 2019-06-26 at 12 56 12 pm

And from the ndb doc, we found that any of the IN operation would translate to OR
https://cloud.google.com/appengine/docs/standard/python/ndb/queries#neq_and_in

And I found this code in the current code:

Run the subqueries sequentially; there is no order to keep.

https://github.com/GoogleCloudPlatform/datastore-ndb-python/blob/master/ndb/query.py#L1957

It doesn't seem to be the most efficient way to filter with IN operation. Wondering if there is a way to change it to make subqueries concurrently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions