Skip to content

fix: scope /i flag to keywords only — prevent lowercase false-positives #35#36

Merged
four-bytes-robby merged 1 commit into
mainfrom
test/35-english-false-positive-regression
Jun 12, 2026
Merged

fix: scope /i flag to keywords only — prevent lowercase false-positives #35#36
four-bytes-robby merged 1 commit into
mainfrom
test/35-english-false-positive-regression

Conversation

@four-bytes-robby

Copy link
Copy Markdown
Member

Problem

The reference regex used /gi flags, making [A-Z0-9] in the capture group match lowercase letters. This caused common English words like "<REFERENCE_1>" to be falsely detected as reference numbers (e.g., Ticket matched the keyword, was matched the value capture because /i made [A-Z0-9] accept lowercase 'w').

Fix

Changed /gi/g and wrapped the keyword alternation group in (?i:...) to scope case-insensitivity to only the keywords. The value capture [A-Z0-9] now correctly requires uppercase letters or digits.

Before: /(?:Referenz|...|Ticket(?:-?nummer)?)\b\s*[:#]?\s*([A-Z0-9][...]{2,40})\b/gi
After: /(?i:Referenz|...|Ticket(?:-?nummer)?)\b\s*[:#]?\s*([A-Z0-9][...]{2,40})\b/g

Added

  • Regression test for 12 common English/tech words that must NOT trigger PII detection: ticket, chat, teams, fallback, reference, account, key, case, id, session, message, token

Test

  • All 101 tests pass (36 regex + 65 other)
  • New regression test caught the bug before fix

Closes #35

@four-bytes-robby four-bytes-robby merged commit 9c0ac84 into main Jun 12, 2026
2 checks passed
@four-bytes-robby four-bytes-robby deleted the test/35-english-false-positive-regression branch June 12, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TEST] Add regression tests for English/tech word false-positives

1 participant