Keep initials from splitting a supplementary code point by alhudz · Pull Request #1722 · apache/commons-lang

alhudz · 2026-06-20T13:59:10Z

Repro: WordUtils.initials("Ben 😀mile Lee") where the second word begins with U+1F600.
Cause: the loop copies the first char after a delimiter and skips the rest of the word, so a word that starts with a supplementary code point keeps only the high surrogate and the low half is dropped, leaving a lone surrogate in the result (B + U+D83D + L).
Fix: copy the trailing low surrogate together with its high half, and size the buffer to the input length so a two-char initial cannot run past it. BMP input is unchanged.

garydgregory · 2026-06-20T14:03:57Z

@alhudz Please see my comment #1719 (review)

garydgregory · 2026-06-20T17:01:21Z

@alhudz
Would please check the Commons Text version of this class for the same issue?
TY!

keep initials from splitting a supplementary code point

254ccda

garydgregory changed the title ~~keep initials from splitting a supplementary code point~~ Keep initials from splitting a supplementary code point Jun 20, 2026

garydgregory merged commit 6bef246 into apache:master Jun 20, 2026
20 of 21 checks passed

garydgregory added a commit that referenced this pull request Jun 20, 2026

Keep initials from splitting a supplementary code point (#1722).

181b80d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep initials from splitting a supplementary code point#1722

Keep initials from splitting a supplementary code point#1722
garydgregory merged 1 commit into
apache:masterfrom
alhudz:wordutils-initials-surrogate

alhudz commented Jun 20, 2026

Uh oh!

garydgregory commented Jun 20, 2026

Uh oh!

garydgregory commented Jun 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alhudz commented Jun 20, 2026

Uh oh!

garydgregory commented Jun 20, 2026

Uh oh!

garydgregory commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garydgregory commented Jun 20, 2026 •

edited

Loading