refactor: rearrange count_deleted_entities to run once per dataset by tombrooks248 · Pull Request #545 · digital-land/digital-land-python

tombrooks248 · 2026-05-11T10:57:59Z

Description

As part of the assemble-and-bake pipeline step we Run dataset expectations... this includes a function called count_deleted_entities. count_deleted_entities make one HTTP request to datasette per organisation (for brownfield-site that is 342 orgs as defined in collection-task/pipeline/expect.csv), different pipelines have different numbers of orgs but in some worst-case pipelines we are talking hundreds of HTTP requests to datasette per dataset.

This PR refactors that code so that it makes the HTTP request once per dataset, getting data for multiple organisations at once, this should reduce the load on datasette throughout the night.

Why

Datasette is getting overwhelmed between the hours of 1am and 3am each night, because of all of these HTTP requests. I believe refactoring these requests to consolidate them will reduce the load on datasettee each night and enable the pipelines to run faster.

Tests

So all the automated tests are passing as you can see.
I have run the pipeline locally on ownership-status and brownfield-site everything seemed to run fine, no errors or failures.

When ran locally - brownfield-site checked the 342 orgs and produced what looks like 312 passing, 30 genuine data quality failures (organisations with missing entities), which I guess seems about right.

Related Tickets & Documents

Ticket Link
Related Issue #
Closes #

QA Instructions, Screenshots, Recordings

Please replace this line with instructions on how to test your changes, a note
on the devices and browsers this has been tested on, as well as any relevant
images for UI changes.

Added/updated tests?

We encourage you to keep the code coverage percentage at 80% and above. Please refer to the Digital Land Testing Guidance for more information.

Yes
No, and this is why: please replace this line with details on why tests
have not been included
I need help with writing tests

[optional] Are there any post deployment tasks we need to perform?

[optional] Are there any dependencies on other PRs or Work?

tombrooks248 force-pushed the refactor/change-count-deleted-entities-to-run-once-per-dataset branch 2 times, most recently from efc3d39 to 1b51013 Compare May 11, 2026 12:49

refactor: rearrange count_deleted_entities to run once per dataset

d7df1a4

tombrooks248 force-pushed the refactor/change-count-deleted-entities-to-run-once-per-dataset branch from 1b51013 to d7df1a4 Compare May 11, 2026 13:36

adding tests

100e63d

tombrooks248 mentioned this pull request May 11, 2026

Investigate Check_deleted_entities expectation for LBO digital-land/config#2286

Open

tombrooks248 marked this pull request as ready for review May 11, 2026 15:26

eveleighoj approved these changes May 12, 2026

View reviewed changes

tombrooks248 merged commit 0b264f8 into main May 12, 2026
5 checks passed

tombrooks248 deleted the refactor/change-count-deleted-entities-to-run-once-per-dataset branch May 12, 2026 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: rearrange count_deleted_entities to run once per dataset#545

refactor: rearrange count_deleted_entities to run once per dataset#545
tombrooks248 merged 2 commits into
mainfrom
refactor/change-count-deleted-entities-to-run-once-per-dataset

tombrooks248 commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tombrooks248 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why

Tests

Related Tickets & Documents

QA Instructions, Screenshots, Recordings

Added/updated tests?

[optional] Are there any post deployment tasks we need to perform?

[optional] Are there any dependencies on other PRs or Work?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tombrooks248 commented May 11, 2026 •

edited

Loading