Skip to content

refactor: rearrange count_deleted_entities to run once per dataset#545

Merged
tombrooks248 merged 2 commits into
mainfrom
refactor/change-count-deleted-entities-to-run-once-per-dataset
May 12, 2026
Merged

refactor: rearrange count_deleted_entities to run once per dataset#545
tombrooks248 merged 2 commits into
mainfrom
refactor/change-count-deleted-entities-to-run-once-per-dataset

Conversation

@tombrooks248
Copy link
Copy Markdown
Contributor

@tombrooks248 tombrooks248 commented May 11, 2026

Description

As part of the assemble-and-bake pipeline step we Run dataset expectations... this includes a function called count_deleted_entities. count_deleted_entities make one HTTP request to datasette per organisation (for brownfield-site that is 342 orgs as defined in collection-task/pipeline/expect.csv), different pipelines have different numbers of orgs but in some worst-case pipelines we are talking hundreds of HTTP requests to datasette per dataset.

This PR refactors that code so that it makes the HTTP request once per dataset, getting data for multiple organisations at once, this should reduce the load on datasette throughout the night.

Why

Datasette is getting overwhelmed between the hours of 1am and 3am each night, because of all of these HTTP requests. I believe refactoring these requests to consolidate them will reduce the load on datasettee each night and enable the pipelines to run faster.

Tests

So all the automated tests are passing as you can see.
I have run the pipeline locally on ownership-status and brownfield-site everything seemed to run fine, no errors or failures.

When ran locally - brownfield-site checked the 342 orgs and produced what looks like 312 passing, 30 genuine data quality failures (organisations with missing entities), which I guess seems about right.

Related Tickets & Documents

QA Instructions, Screenshots, Recordings

Please replace this line with instructions on how to test your changes, a note
on the devices and browsers this has been tested on, as well as any relevant
images for UI changes.

Added/updated tests?

We encourage you to keep the code coverage percentage at 80% and above. Please refer to the Digital Land Testing Guidance for more information.

  • Yes
  • No, and this is why: please replace this line with details on why tests
    have not been included
  • I need help with writing tests

[optional] Are there any post deployment tasks we need to perform?

[optional] Are there any dependencies on other PRs or Work?

@tombrooks248 tombrooks248 force-pushed the refactor/change-count-deleted-entities-to-run-once-per-dataset branch 2 times, most recently from efc3d39 to 1b51013 Compare May 11, 2026 12:49
@tombrooks248 tombrooks248 force-pushed the refactor/change-count-deleted-entities-to-run-once-per-dataset branch from 1b51013 to d7df1a4 Compare May 11, 2026 13:36
@tombrooks248 tombrooks248 merged commit 0b264f8 into main May 12, 2026
5 checks passed
@tombrooks248 tombrooks248 deleted the refactor/change-count-deleted-entities-to-run-once-per-dataset branch May 12, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants