Skip to content

Updated h5dump --xml#6394

Open
bmribler wants to merge 10 commits into
HDFGroup:developfrom
bmribler:update_xsdfile
Open

Updated h5dump --xml#6394
bmribler wants to merge 10 commits into
HDFGroup:developfrom
bmribler:update_xsdfile

Conversation

@bmribler
Copy link
Copy Markdown
Collaborator

@bmribler bmribler commented May 4, 2026

The location of the files HDF5-File.xsd and HDF5-File.dtd no longer exists, and the files were added to the repo (PR #5490). Before the --xml option is actually removed, running the h5dump --xml tests will fail. This PR updated h5dump and the expected output to use the correct files' location and actually completed the ticket HELP-2668.

The location of the files HDF5-File.xsd and HDF5-File.dtd no longer exists, and
the files were added to the repo (PR HDFGroup#5490).  Before the --xml is actually removed,
running the h5dump tests will fail.  This PR updated h5dump and the expected output
to use the correct files' location and actually completed the ticket HELP-2668.
hyoklee
hyoklee previously approved these changes May 4, 2026
@jhendersonHDF
Copy link
Copy Markdown
Collaborator

Before the --xml option is actually removed, running the h5dump --xml tests will fail.

@bmribler what did you mean by this?

@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="UTF-8"?>
<hdf5:HDF5-File xmlns:hdf5="http://hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hdfgroup.org/HDF5/XML/schema/HDF5-File http://www.hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd">
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first glance, it seems to me like these should remain as web URLs rather than local files

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files are now in the repo. Should I use the GH repo develop branch for the location?

Copy link
Copy Markdown
Collaborator Author

@bmribler bmribler May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @brtnfld said in the standup, GH URL can be unreliable, but did we silently decide that we use it for now?

Copy link
Copy Markdown
Collaborator

@brtnfld brtnfld Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — using the GitHub raw URL is the right approach since the files now live in the repo. That keeps the output as a proper web URL (addressing Jordan's concern) while pointing to a location that actually exists. Something like:

https://raw.githubusercontent.com/HDFGroup/hdf5/develop/tools/test/h5dump/testfiles/xml/HDF5-File.xsd

https://raw.githubusercontent.com/HDFGroup/hdf5/develop/tools/test/h5dump/testfiles/xml/HDF5-File.dtd

The ../../testfiles/xml/ relative path in the current PR bakes a test-directory layout assumption into the production binary output, which would break any downstream consumer that tries to resolve the schema location outside of the test tree.

Comment thread tools/src/h5dump/h5dump.c
@@ -1597,8 +1597,8 @@ main(int argc, char *argv[])
"<%sHDF5-File xmlns:%s=\"http://hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd\" "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But having said that, it seems this would need to be updated as well if changing the URLs

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just names, the files won't be looked for by anything, if I'm not mistaken. I was thinking about using the GH repo URLs, but waited to ask when this is brought up.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line sets the xmlns namespace URI (xmlns:hdf5="http://hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd"), which is an opaque identifier per the XML Namespaces spec — it does not need to resolve to an actual resource. Changing it would alter the namespace identity of the output and silently break any downstream tool that identifies HDF5 XML output by its namespace URI, since namespace identity in XML is defined by that URI value, not by where the schema file lives. Only the xsi:schemaLocation location hint needs to point to the actual file. So this line should stay as-is.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL that this comment was made on still needs to be changed. You can probably just substitute in the DEFAULT_XSD string at this point.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would likely be misuse of the namespace URI for downstream users to rely on it for anything, so I think it should be something less misleading since that URL doesn't point to anything useful and someone reading it may think it should.

@bmribler
Copy link
Copy Markdown
Collaborator Author

bmribler commented May 4, 2026

Before the --xml option is actually removed, running the h5dump --xml tests will fail.

@bmribler what did you mean by this?

@jhendersonHDF Thanks for looking at this. I'm not sure which part of that sentence you're asking about, so I explain the full context. --xml is deprecated, but before it is officially removed from h5dump, the tests should still work. Currently, because the location of those files no longer exist, tests will fail when those tests run. Please let me know if I have misunderstood anything.

@jhendersonHDF
Copy link
Copy Markdown
Collaborator

@jhendersonHDF Thanks for looking at this. I'm not sure which part of that sentence you're asking about, so I explain the full context. --xml is deprecated, but before it is officially removed from h5dump, the tests should still work. Currently, because the location of those files no longer exist, tests will fail when those tests run. Please let me know if I have misunderstood anything.

I was asking about the part mentioning those tests failing since we currently run the h5dump xml tests and they pass as they're only checking the diff output

@bmribler
Copy link
Copy Markdown
Collaborator Author

bmribler commented May 5, 2026

@jhendersonHDF Thanks for looking at this. I'm not sure which part of that sentence you're asking about, so I explain the full context. --xml is deprecated, but before it is officially removed from h5dump, the tests should still work. Currently, because the location of those files no longer exist, tests will fail when those tests run. Please let me know if I have misunderstood anything.

I was asking about the part mentioning those tests failing since we currently run the h5dump xml tests and they pass as they're only checking the diff output

Ah, ok. h5dump hardcoded the file names using the url to a non-existing location. I don't know when that matters in the tests, but it's definitely wrong.

hyoklee
hyoklee previously approved these changes May 18, 2026
@ajelenak ajelenak added this to the HDF5 2.2.0 milestone Jun 2, 2026
@brtnfld
Copy link
Copy Markdown
Collaborator

brtnfld commented Jun 3, 2026

Agreed it's wrong — the dead URL should be fixed. To clarify when it matters: the xsi:schemaLocation attribute is advisory for XSD-based files so most validators will silently ignore it if unreachable, but the DTD SYSTEM identifier (the <!DOCTYPE ... "url"> form) is normative — a validating XML parser will attempt to fetch it, so a broken URL there is a harder failure for anyone running validation downstream. Recommend updating DEFAULT_XSD and DEFAULT_DTD in h5dump.c to the GitHub raw URLs (see inline thread) and regenerating the fixtures, rather than the current local relative path.

@bmribler
Copy link
Copy Markdown
Collaborator Author

bmribler commented Jun 4, 2026

@brtnfld Yes, I fixed DEFAULT_XSD and DEFAULT_DTD in h5dump.c to the GH urls. It's just that I'm still missing something so having some test failures but I'll figure it out and push the changes soon.

@bmribler
Copy link
Copy Markdown
Collaborator Author

bmribler commented Jun 5, 2026

I made your change suggestions, @brtnfld

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To be triaged

Development

Successfully merging this pull request may close these issues.

6 participants