Skip to content

Replace ConvertUTF with inline SI_UTF8 conversion#97

Merged
brofield merged 4 commits into
masterfrom
inline-utf8-conversion
Jun 15, 2026
Merged

Replace ConvertUTF with inline SI_UTF8 conversion#97
brofield merged 4 commits into
masterfrom
inline-utf8-conversion

Conversation

@brofield

Copy link
Copy Markdown
Owner

Summary

  • Remove vendored ConvertUTF.c/h and implement locale-independent UTF-8 conversion inline via SI_UTF8::Encode, SI_UTF8::Decode, and SI_UTF8::REPLACEMENT in SimpleIni.h
  • SI_CONVERT_GENERIC is now truly header-only — no extra source files to compile or link
  • Add differential tests comparing our codec against c32rtomb/mbrtoc32 for all assigned Unicode scalars U+0000..U+10FFFF, plus external invalid-UTF-8 fixtures and a UTF-8 INI integration test

Motivation

PR #95 proposed replacing ConvertUTF with <uchar.h> locale-dependent APIs, which breaks under "C" locale. This keeps the dependency reduction while preserving locale-independent behavior.

Test plan

  • Full test suite passes (ctest)
  • Utf8Conversion.EncodeAndDecodeMatchSystemLibraryForAllAssignedScalars (~1.1M code points vs system lib)
  • utf8-reject.hex invalid sequences rejected
  • utf8-ini-roundtrip.ini load/save via CSimpleIniW

Made with Cursor

brofield and others added 4 commits June 15, 2026 12:31
Remove the vendored ConvertUTF sources and implement SI_UTF8::Encode/Decode
directly in SimpleIni.h so SI_CONVERT_GENERIC stays header-only. Add
differential tests against the platform C library and external invalid-UTF-8
fixtures.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Apple Clang provides <uchar.h> but not c32rtomb/mbrtoc32 in C++ builds.
Use iconv (UTF-32LE <-> UTF-8) as the system reference on macOS instead.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use \x wide-character escapes for expected Unicode values instead of UTF-8
source literals, which MSVC misinterprets without /utf-8. Enable /utf-8 for
the tests target and drop emoji from the mixed fixture to avoid wchar_t
UTF-16 vs UTF-32 representation differences.

Co-authored-by: Cursor <cursoragent@cursor.com>
@brofield brofield merged commit 1f878b3 into master Jun 15, 2026
5 checks passed
@brofield brofield deleted the inline-utf8-conversion branch June 15, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant