Skip to content

Reject sign characters in UnicodeUnescaper hex values#1721

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:unicodeunescaper-reject-sign
Jun 20, 2026
Merged

Reject sign characters in UnicodeUnescaper hex values#1721
garydgregory merged 1 commit into
apache:masterfrom
alhudz:unicodeunescaper-reject-sign

Conversation

@alhudz

@alhudz alhudz commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

UnicodeUnescaper.translate reads the four-character value of a \uXXXX escape with Integer.parseInt(value, 16), which tolerates a leading sign.

  1. \u-047 decodes to U+FFB9 rather than being rejected, because parseInt("-047", 16) returns -71.
  2. an embedded sign such as \u02-3 already throws IllegalArgumentException, so accepting a leading sign is inconsistent.

Reject a leading +/- in the value field before parsing. The documented u+ notation (\u+0047) is unaffected.

Repro: new UnicodeUnescaper().translate("\\u-047")
Expected: IllegalArgumentException
Actual: returns the string \uFFB9

@garydgregory garydgregory changed the title reject sign characters in UnicodeUnescaper hex values Reject sign characters in UnicodeUnescaper hex values Jun 20, 2026
@garydgregory garydgregory merged commit 4e1f3ee into apache:master Jun 20, 2026
20 of 21 checks passed
@garydgregory

Copy link
Copy Markdown
Member

@alhudz
Would you please review UnicodeEscaper in Apache Commons Text?

garydgregory added a commit that referenced this pull request Jun 20, 2026
- Update changes.xml
- Javadoc
- Reduce vertical whitespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants