Skip to content

Add HDT read/write serialization support#55

Open
sihingkk wants to merge 2 commits into
DataTreehouse:mainfrom
sihingkk:hdt
Open

Add HDT read/write serialization support#55
sihingkk wants to merge 2 commits into
DataTreehouse:mainfrom
sihingkk:hdt

Conversation

@sihingkk

@sihingkk sihingkk commented Jun 12, 2026

Copy link
Copy Markdown

This adds HDT (Header Dictionary Triples) as a supported format for Model.read() and Model.write(), using the hdt crate.

What's included

  • Reading: m.read("file.hdt") (format inferred from the .hdt extension, or pass format="hdt"). Implemented by iterating the HDT dictionary and mapping into oxrdf terms; literals are parsed via oxrdf's N-Triples-style lexical forms, matching the hdt crate's conventions.
  • Writing: m.write("file.hdt", format="hdt"). The HDT four-section dictionary and triple bitmaps are built directly in memory and written straight to the output — no temporary file or intermediate N-Triples serialization. Literals with special characters are stored N-Triples-escaped, following the hdt crate's conventions.
  • reads()/writes() reject "hdt" with a clear error since it is a binary format.
  • Python tests in py_maplib/tests/test_hdt.py covering round-trips of IRIs, blank nodes, plain/typed/language-tagged literals, special characters, format inference, multi-graph behavior, and the error paths.

Notes and limitations

  • The hdt crate loads the entire HDT file into memory on read; there is no streaming.
  • HDT is read-only by design, so write always produces a fresh file.
  • Cargo.lock grows by the hdt crate's dependency tree.

All py_maplib tests pass (384 passed, 2 skipped).

@thenonameguy

Copy link
Copy Markdown

Writing: m.write("file.hdt", format="hdt"). The hdt crate builds its dictionary from N-Triples input, so writing goes through a temporary N-Triples serialization; literals with special characters are stored N-Triples-escaped, again following the hdt crate.

Please do this without a temporary file/in-memory, so we don't need a 2x memory/disk space for |triples|.

curiouspresence and others added 2 commits June 12, 2026 15:14
Read uses the hdt crate's dictionary iteration mapped into oxrdf quads,
with literals parsed via oxrdf Literal::from_str.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@thenonameguy

Copy link
Copy Markdown

fixes #52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants