Summary
For datasets whose schema omits dtype, it is not clear what validation behavior should be considered normative across APIs.
A concrete example is hdmf_common/VectorData, which omits dtype in the schema. In Python HDMF, omitted dtype still has practical runtime restrictions based on build-time inference and backend handling, but those rules do not appear to be documented as schema-level validation requirements.
For another implementation such as MatNWB, this creates ambiguity about what should be accepted or rejected during validation.
Concrete problem in MatNWB
At the moment, the following is possible in MatNWB:
types.hdmf_common.VectorData(data, types.hdmf_common.VectorData(data, 1))
This seems like it should be invalid, because the payload of a dataset is itself another HDMF container object rather than dataset data.
However, because MatNWB currently does not perform type validation when dtype is omitted, this is accepted.
Question
What is the intended normative validation behavior for datasets with omitted dtype, especially concrete types such as VectorData?
More specifically:
- Should omitted
dtype be interpreted as "no fixed primitive dtype is prescribed by the schema", while still requiring the dataset payload to be valid leaf/storable data?
- Should implementations reject HDMF container objects as direct dataset payloads unless the schema explicitly declares a reference dtype?
- Which parts of Python HDMFs current behavior for omitted
dtype are intentional cross-API semantics, and which parts are just implementation details of that API?
Why this matters
As a maintainer of another HDMF-based API, I need to know whether validation for omitted dtype should be:
- schema-level permissive: no explicit dtype constraint, but still reject obviously non-dataset objects
- Python-HDMF-compatible: mirror current inference/restriction behavior
- something else
Right now, the schema appears intentionally generic, but the practical validation semantics are underspecified for alternative implementations.
Request
Could the intended semantics for omitted dataset dtype be clarified, especially for VectorData and similar concrete types?
If there is already an intended rule, documenting it in the schema language docs and/or hdmf-common docs would help other implementations validate consistently.
This issue text was drafted with the help of Codex (GPT-5.4).
Summary
For datasets whose schema omits
dtype, it is not clear what validation behavior should be considered normative across APIs.A concrete example is
hdmf_common/VectorData, which omitsdtypein the schema. In Python HDMF, omitteddtypestill has practical runtime restrictions based on build-time inference and backend handling, but those rules do not appear to be documented as schema-level validation requirements.For another implementation such as MatNWB, this creates ambiguity about what should be accepted or rejected during validation.
Concrete problem in MatNWB
At the moment, the following is possible in MatNWB:
This seems like it should be invalid, because the payload of a dataset is itself another HDMF container object rather than dataset data.
However, because MatNWB currently does not perform type validation when
dtypeis omitted, this is accepted.Question
What is the intended normative validation behavior for datasets with omitted
dtype, especially concrete types such asVectorData?More specifically:
dtypebe interpreted as "no fixed primitive dtype is prescribed by the schema", while still requiring the dataset payload to be valid leaf/storable data?dtypeare intentional cross-API semantics, and which parts are just implementation details of that API?Why this matters
As a maintainer of another HDMF-based API, I need to know whether validation for omitted
dtypeshould be:Right now, the schema appears intentionally generic, but the practical validation semantics are underspecified for alternative implementations.
Request
Could the intended semantics for omitted dataset
dtypebe clarified, especially forVectorDataand similar concrete types?If there is already an intended rule, documenting it in the schema language docs and/or
hdmf-commondocs would help other implementations validate consistently.This issue text was drafted with the help of Codex (GPT-5.4).