Skip to content

Keep initials from splitting a supplementary code point#1722

Merged
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-initials-surrogate
Jun 20, 2026
Merged

Keep initials from splitting a supplementary code point#1722
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-initials-surrogate

Conversation

@alhudz

@alhudz alhudz commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Repro: WordUtils.initials("Ben 😀mile Lee") where the second word begins with U+1F600.
Cause: the loop copies the first char after a delimiter and skips the rest of the word, so a word that starts with a supplementary code point keeps only the high surrogate and the low half is dropped, leaving a lone surrogate in the result (B + U+D83D + L).
Fix: copy the trailing low surrogate together with its high half, and size the buffer to the input length so a two-char initial cannot run past it. BMP input is unchanged.

@garydgregory

Copy link
Copy Markdown
Member

@alhudz Please see my comment #1719 (review)

@garydgregory garydgregory changed the title keep initials from splitting a supplementary code point Keep initials from splitting a supplementary code point Jun 20, 2026
@garydgregory

garydgregory commented Jun 20, 2026

Copy link
Copy Markdown
Member

@alhudz
Would please check the Commons Text version of this class for the same issue?
TY!

@garydgregory garydgregory merged commit 6bef246 into apache:master Jun 20, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants