Skip to content

fix: race-safe add_processed_item via atomic upsert (#2626)#2627

Merged
chubes4 merged 1 commit into
mainfrom
fix-processed-items-dupkey-race
Jun 15, 2026
Merged

fix: race-safe add_processed_item via atomic upsert (#2626)#2627
chubes4 merged 1 commit into
mainfrom
fix-processed-items-dupkey-race

Conversation

@chubes4

@chubes4 chubes4 commented Jun 15, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #2626. ProcessedItems->add_processed_item() raced between its UPDATE and a bare INSERT, so concurrent Action Scheduler workers processing the same source item collided on the flow_source_item unique key. wpdb->insert() logged a hard Duplicate entry DB error to debug.log on every collision (~2,809 entries/day on the events site, blog 7) even though the code caught the duplicate string and returned true afterward.

The unique key was correctly preventing dupes — the bug was that the bare INSERT logged a hard DB error on every race instead of treating "already processed" as a normal no-op.

Fix

Replaced the UPDATE-then-bare-INSERT pair with a single atomic INSERT ... ON DUPLICATE KEY UPDATE keyed on flow_source_item:

  • Fresh item → inserted as processed.
  • Existing in-flight claimed row → converted to final processed state (new job_id/processed_timestamp, claim_expires_at cleared) — same end state as the old UPDATE path.
  • Concurrent racing worker → clean no-op update, no logged Duplicate entry DB error.

A check-then-insert would not be sufficient (the race is between check and insert); this is a single atomic statement. Dedup behavior is unchanged: exactly one row per (flow_step_id, source_type, item_identifier).

Verification

  • php -l clean on the changed file.
  • composer lint (phpcs) on the changed file exits 0.
  • php tests/processed-item-claims-smoke.php → 21 assertions, 0 failures.
  • Validated the exact SQL against real MySQL on a temp table with the production schema:
    • pre-existing claimed row → converted to processed, claim cleared
    • two racing upserts on the same item → exactly one row, no duplicate-key error
    • fresh item → inserted normally
    • total rows = one per unique source item (dedup preserved)

Spans the affected handlers (universal_web_scraper, dice_fm, ticketmaster, vision_flyer) since the fix is in the shared write path.

Fixes #2626

Concurrent Action Scheduler workers processing the same source item
raced between the UPDATE and the bare INSERT in add_processed_item. The
loser hit the flow_source_item unique key and wpdb->insert() logged a
hard 'Duplicate entry' DB error to debug.log on every collision (~2,809
entries/day on the events site) even though the duplicate was caught and
treated as success afterward.

Replace the UPDATE-then-INSERT pair with a single atomic
INSERT ... ON DUPLICATE KEY UPDATE keyed on flow_source_item. A duplicate
is now a normal no-op update (no logged DB error), an existing in-flight
claim is still converted to final processed state, and dedup behavior is
unchanged: exactly one row per source item.

Fixes #2626
@homeboy-ci

homeboy-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Homeboy Results — data-machine

Lint

lint — passed

ℹ️ Full options: homeboy docs commands/lint
Deep dive: homeboy lint data-machine --changed-since 6ea31fe

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-lint-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-lint-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/27551393404

Test

test — passed

ℹ️ No impacted tests found for --changed-since 6ea31fe
ℹ️ Run full suite if needed: homeboy test data-machine
Deep dive: homeboy test data-machine --changed-since 6ea31fe

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-test-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-test-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/27551393404

Audit

audit — passed

  • audit — 76 finding(s)
  • Total: 76 finding(s)

Deep dive: homeboy audit data-machine --changed-since 6ea31fe

Artifacts and drill-down
  • CI results artifact: homeboy-ci-results-data-machine-audit-quality-Linux-node24 contains immediate command JSON for this action invocation.
  • Observation artifact: homeboy-observations-data-machine-audit-quality-Linux-node24 contains exported Homeboy run history for deeper queries.
  • Drill-down: download the observation artifact, then run homeboy runs import <dir>, homeboy runs list, and homeboy runs findings <run-id>.
  • Artifacts are attached to the workflow run: https://github.com/Extra-Chill/data-machine/actions/runs/27551393404
Tooling versions
  • Homeboy CLI: homeboy 0.229.11+31568619
  • Extension: wordpress from https://github.com/Extra-Chill/homeboy-extensions
  • Extension revision: 67a50b29
  • Action: unknown@unknown

@chubes4 chubes4 merged commit 6db0d3e into main Jun 15, 2026
5 checks passed
@chubes4 chubes4 deleted the fix-processed-items-dupkey-race branch June 15, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ProcessedItems->add_processed_item uses bare INSERT — concurrent scraper workers flood debug.log with duplicate-key errors (2,809/day)

1 participant